New Contributor Onboarding Guide
This document aims to provide complete information needed for anyone who would like to contribute to the dlt-meta project. Your contributions are vital to its success, whether you’re fixing bugs, improving documentation, or adding new features.
Steps
Step 0 - Read the documentation
Refer documentation wiki page here that will guide you to access different DLT-META resources like documentation, github repo, presentation etc. Read the getting started link here to understand pre-requisite , setup steps and configuration details.
Prerequisite
- Install Databricks CLI to you local machine
- Authenticate you current machine to a Databricks Workspace
- Python 3.8.0+
Step 1 - Fork the Repository
In case you may not be able to fork this repo because the repository is outside of your enterprise Databricks(EMU) , follow step3 or Fork using a personal github account.
Step 2 - Clone the Repository Locally
- Run command “git clone https://github.com/databrickslabs/dlt-meta.git” it will create folder name “dlt-meta”
Step 3 - Set Up the Development Environment
- cd dlt-meta
- Create python virtual environment
- python -m venv .venv or python3 -m venv .venv
- Activate python virtual environment
- source .venv/bin/activate
- Install databricks sdk
- pip install databricks-sdk
- Install code editor like VS code or any other.
- Import project into VS code File > Open folder > select above dlt-meta folder from your system
- Install setuptools and wheel if not already installed
- pip install setuptools wheel
- Install the project dependencies specified in setup.py
- pip install -e .
- Build the project
* python setup.py sdist bdist_wheel - Install additional dependencies
- pip install pyspark
- pip install delta-spark
- Pip install pytest
Step 4 - Running Unit and Integration Tests
Unit test are at tests folder
- To run the test cases, use the pytest command in the terminal
- To run all tests run - pytest
- To run specific test- pytest -k “test_case_name”
Integration Tests are at integration_tests folder
- To run integration test run file run_integration_tests.py with mandatory required argument as below
- e.g. run_integration_tests.py --uc_catalog_name datta_demo --cloud_provider_name aws --dbr_version 14.3 --source cloudfiles --dbfs_path “dbfs:/tmp/DLT-META/” --profile DEFAULT
Step 5 - Find Beginner-Friendly Issues
Refer open issues [here](https://github.com/databrickslabs/dlt-meta/issues).
Step 7 - Work on the Issue
- Go to the repository’s page
- Click the Fork button
- A forked copy will be added to your GitHub repositories list
- A small text below the repository name will confirm that it’s a fork
You can also fork a repository on GitHub Desktop: - Click Clone Repository in the File menu
- Select the local directory to clone the repository into
- Click Continue
- Comments the proposed sketch design which includes points problem statement, goals and objectives, proposed sketch, scope, implementation plan, risk and mitigation strategies and testing and validation plan.
- Write the code.
- Write the unit tests and validate them.
- Run and validate the integration test cases.
- Upon successful completion of testing, commit the changes to your git repository
Step 8 - Submit a PR
Once you have successfully made the changes, commit them to your repository and create pull requests. For more details, refer to the [documentation](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork).
Step 9 - Celebrate your Contribution
Congratulations.