New Contributor Onboarding Guide

This document aims to provide complete information needed for anyone who would like to contribute to the dlt-meta project. Your contributions are vital to its success, whether you’re fixing bugs, improving documentation, or adding new features.

Steps

Step 0 - Read the documentation

Refer documentation wiki page here that will guide you to access different DLT-META resources like documentation, github repo, presentation etc. Read the getting started link here to understand pre-requisite , setup steps and configuration details.

Prerequisite

  • Install Databricks CLI to you local machine
  • Authenticate you current machine to a Databricks Workspace
  • Python 3.8.0+

Step 1 - Fork the Repository

In case you may not be able to fork this repo because the repository is outside of your enterprise Databricks(EMU) , follow step3 or Fork using a personal github account.

Step 2 - Clone the Repository Locally

  1. Run command “git clone https://github.com/databrickslabs/dlt-meta.git” it will create folder name “dlt-meta”

Step 3 - Set Up the Development Environment

  1. cd dlt-meta
  2. Create python virtual environment
    • python -m venv .venv or python3 -m venv .venv
  3. Activate python virtual environment
    • source .venv/bin/activate
  4. Install databricks sdk
    • pip install databricks-sdk
  5. Install code editor like VS code or any other.
  6. Import project into VS code File > Open folder > select above dlt-meta folder from your system
  7. Install setuptools and wheel if not already installed
    • pip install setuptools wheel
  8. Install the project dependencies specified in setup.py
    • pip install -e .
  9. Build the project
    * python setup.py sdist bdist_wheel
  10. Install additional dependencies
    • pip install pyspark
    • pip install delta-spark
    • Pip install pytest

Step 4 - Running Unit and Integration Tests

  • Unit test are at tests folder

    • To run the test cases, use the pytest command in the terminal
    • To run all tests run - pytest
    • To run specific test- pytest -k “test_case_name”
  • Integration Tests are at integration_tests folder

    • To run integration test run file run_integration_tests.py with mandatory required argument as below
    • e.g. run_integration_tests.py --uc_catalog_name datta_demo --cloud_provider_name aws --dbr_version 14.3 --source cloudfiles --dbfs_path “dbfs:/tmp/DLT-META/” --profile DEFAULT

Step 5 - Find Beginner-Friendly Issues

Refer open issues [here](https://github.com/databrickslabs/dlt-meta/issues). 

Step 7 - Work on the Issue

  1. To work on the issue Fork the repository as shown below

    Here’s how to fork a repository on GitHub:
  • Go to the repository’s page
  • Click the Fork button
  • A forked copy will be added to your GitHub repositories list
  • A small text below the repository name will confirm that it’s a fork
    You can also fork a repository on GitHub Desktop:
  • Click Clone Repository in the File menu
  • Select the local directory to clone the repository into
  • Click Continue
  1. Comments the proposed sketch design which includes points problem statement, goals and objectives, proposed sketch, scope, implementation plan, risk and mitigation strategies and testing and validation plan.
  2. Write the code.
  3. Write the unit tests and validate them.
  4. Run and validate the integration test cases.
  5. Upon successful completion of testing, commit the changes to your git repository

Step 8 - Submit a PR

Once you have successfully made the changes, commit them to your repository and create pull requests. For more details, refer to the [documentation](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork). 

Step 9 - Celebrate your Contribution

Congratulations.