- Documentation
- Develop on Databricks
- Developer tools and guidance
- What are Databricks Asset Bundles?
- Run a CI/CD workflow with a Databricks Asset Bundle and GitHub Actions
This article describes how to run a CI/CD (continuous integration/continuous deployment) workflow in GitHub with GitHub Actions and a Databricks Asset Bundle. See What are Databricks Asset Bundles?
You can use GitHub Actions along with Databricks CLI bundle
commands to automate, customize, and run your CI/CD workflows from within your GitHub repositories.
You can add GitHub Actions YAML files such as the following to your repo’s .github/workflows
directory. The following example GitHub Actions YAML file validates, deploys, and runs the specified job in the bundle within a pre-production target named “qa” as defined within a bundle configuration file. This example GitHub Actions YAML file relies on the following:
A bundle configuration file at the root of the repository, which is explicitly declared through the GitHub Actions YAML file’s setting
working-directory: .
(This setting can be omitted if the bundle configuration file is already at the root of the repository.) This bundle configuration file defines a Databricks workflow namedmy-job
and a target namedqa
. See Databricks Asset Bundle configurations.A GitHub secret named
SP_TOKEN
, representing the Databricks access token for a Databricks service principal that is associated with the Databricks workspace to which this bundle is being deployed and run. See Encrypted secrets.
# This workflow validates, deploys, and runs the specified bundle# within a pre-production target named "qa".name: "QA deployment"# Ensure that only a single job or workflow using the same concurrency group# runs at a time.concurrency: 1# Trigger this workflow whenever a pull request is opened against the repo's# main branch or an existing pull request's head branch is updated.on: pull_request: types: - opened - synchronize branches: - mainjobs: # Used by the "pipeline_update" job to deploy the bundle. # Bundle validation is automatically performed as part of this deployment. # If validation fails, this workflow fails. deploy: name: "Deploy bundle" runs-on: ubuntu-latest steps: # Check out this repo, so that this workflow can access it. - uses: actions/checkout@v3 # Download the Databricks CLI. # See https://github.com/databricks/setup-cli - uses: databricks/setup-cli@main # Deploy the bundle to the "qa" target as defined # in the bundle's settings file. - run: databricks bundle deploy working-directory: . env: DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }} DATABRICKS_BUNDLE_ENV: qa # Validate, deploy, and then run the bundle. pipeline_update: name: "Run pipeline update" runs-on: ubuntu-latest # Run the "deploy" job first. needs: - deploy steps: # Check out this repo, so that this workflow can access it. - uses: actions/checkout@v3 # Use the downloaded Databricks CLI. - uses: databricks/setup-cli@main # Run the Databricks workflow named "my-job" as defined in the # bundle that was just deployed. - run: databricks bundle run my-job --refresh-all working-directory: . env: DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }} DATABRICKS_BUNDLE_ENV: qa
The following GitHub Actions YAML file can exist in the same repo as the preceding file. This file validates, deploys, and runs the specified bundle within a production target named “prod” as defined within a bundle configuration file. This example GitHub Actions YAML file relies on the following:
A bundle configuration file at the root of the repository, which is explicitly declared through the GitHub Actions YAML file’s setting
working-directory: .
(This setting can be omitted if the bundle configuration file is already at the root of the repository.). This bundle configuration file defines a Databricks workflow namedmy-job
and a target namedprod
. See Databricks Asset Bundle configurations.A GitHub secret named
SP_TOKEN
, representing the Databricks access token for a Databricks service principal that is associated with the Databricks workspace to which this bundle is being deployed and run. See Encrypted secrets.
# This workflow validates, deploys, and runs the specified bundle# within a production target named "prod".name: "Production deployment"# Ensure that only a single job or workflow using the same concurrency group# runs at a time.concurrency: 1# Trigger this workflow whenever a pull request is pushed to the repo's# main branch.on: push: branches: - mainjobs: deploy: name: "Deploy bundle" runs-on: ubuntu-latest steps: # Check out this repo, so that this workflow can access it. - uses: actions/checkout@v3 # Download the Databricks CLI. # See https://github.com/databricks/setup-cli - uses: databricks/setup-cli@main # Deploy the bundle to the "prod" target as defined # in the bundle's settings file. - run: databricks bundle deploy working-directory: . env: DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }} DATABRICKS_BUNDLE_ENV: prod # Validate, deploy, and then run the bundle. pipeline_update: name: "Run pipeline update" runs-on: ubuntu-latest # Run the "deploy" job first. needs: - deploy steps: # Check out this repo, so that this workflow can access it. - uses: actions/checkout@v3 # Use the downloaded Databricks CLI. - uses: databricks/setup-cli@main # Run the Databricks workflow named "my-job" as defined in the # bundle that was just deployed. - run: databricks bundle run my-job --refresh-all working-directory: . env: DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }} DATABRICKS_BUNDLE_ENV: prod