Skip to main content

AI Tools & Skills

DQX ships Agent Skills that teach AI assistants how to use the library correctly. They're small, focused Markdown files with YAML frontmatter — the open format supported by Databricks Genie Code, Claude Code, and any other tool that follows the standard.

The skills live in the skills/ directory of the DQX repo.

What's included

SkillWhat it coversCanonical docs
dqx-define-checksCreating quality rules — DQRowRule, DQDatasetRule, DQForEachColRule, YAML / dict metadata formQuality Checks Definition
dqx-apply-checksValidating a DataFrame or table against a set of rulesApplying Quality Checks
dqx-end-to-endRead → check → write in one call with apply_checks_and_save_in_tableEnd-to-end apply
dqx-profile-and-generateProfiling and generating rule candidates with DQProfiler / DQGeneratorData Profiling
dqx-storageLoading / saving checks across file, workspace, volume, table, installation, and Lakebase backendsLoading and Storing Quality Checks

Each skill is a folder containing a single SKILL.md with the standard name + description frontmatter. Tools auto-load skills based on the description; users can also invoke them explicitly (@dqx-define-checks, slash commands, etc.) depending on the tool.

Installing DQX skills

Clone the DQX repository in your workspace using a Git folder, then copy the skills/ folder into a workspace-level or user-level skills directory:

# Option A — workspace-wide (all Genie Code users see these)
databricks workspace import-dir skills /Workspace/.assistant/skills

# Option B — current user only — substitute your workspace email below
databricks workspace import-dir skills /Users/<your-email>/.assistant/skills

Genie Code picks up skills from these directories automatically. Use the following prompt with Agent mode to confirm:

"List the DQX skills you can use."

The skill fires when its description matches your request. Use @ to manually invoke a specific skill (e.g. @dqx-define-checks add a uniqueness check on order_id).

See the Databricks Genie Code Documentation for more details.

Using DQX skills

After installing DQX skills, you can either let your tool use skills automatically or invoke them by name. Typical prompts:

Add a DQX uniqueness check on (order_id, line_item_id) to my pipeline.
Split my bronze table into valid and quarantine outputs using these rules: …
Profile catalog.schema.orders and suggest quality checks.
Load DQX checks from a Delta table and apply them to a streaming DataFrame.

Agents will load the relevant skill into context, follow its patterns, and link back to the canonical documentation for anything outside the skill's scope.

LLM-friendly documentation

Beyond the skills, the entire DQX documentation site is published in an LLM-friendly Markdown format, so AI assistants can read the docs directly. This is useful when you want an agent to ground its answers in the full documentation rather than only the installed skills.

ResourceURLUse it for
Index (llms.txt)/llms.txtA curated, sectioned list of every page with a one-line summary and a link to its Markdown version — point an agent here to discover what's available.
Full corpus (llms-full.txt)/llms-full.txtThe entire documentation concatenated as Markdown, for ingesting everything in a single fetch.
Single pageappend .md to any docs URLFetch one page as Markdown, e.g. /docs/reference/quality_checks.md.

These files are generated automatically when the docs are built (via the @signalwire/docusaurus-plugin-llms-txt plugin), so they always reflect the current documentation.

Skills vs. llms.txt

Use skills to teach an agent how to call DQX APIs correctly — they are loaded on every matching request, so they are kept short and token-cheap. Use llms.txt / .md pages when you want the agent to read the full documentation for a topic the skills intentionally keep brief.

Extending DQX skills

DQX's agent skills are scoped to DQX's public APIs. Follow these guidelines when extending them:

  • Skills should stay short; the full SKILL.md is loaded every time the skill fires, so every line costs tokens on every invocation.
  • Prefer linking to /docs/... over duplicating content; the skill's job is to tell the model when and how to use the API, not to reprint the reference.
  • Always import from databricks.labs.dqx.* — never guess module paths.
  • Point to the canonical documentation for any topic outside the skill's core responsibility.
  • Changes to the public DQX API should be reflected in the matching skill in the same PR. See Contributing for the full workflow.

Source