Q. What is Delta Operational Metrics Store ?
Delta Operational metrics store (DeltaOMS) is a solution/framework for automated collection and tracking of Delta commit logs and other future operational metrics from Delta Lake, build a centralized repository for Delta Lake operational statistics and simplify analysis across the entire data lake.
The solution can be easily enabled and configured to start capturing the operational metrics into a centralized repository on the data lake. Once the data is collated , it would unlock the possibilities for gaining operational insights, creating dashboards for traceability of operations across the data lake through a single pane of glass and other analytical use cases.
Q. What are the benefits of using DeltaOMS ?
Tracking and analyzing Delta Lake operational metrics across multiple database objects requires building a custom solution on the Delta Lakehouse.DeltaOMS helps to automate the collection of operational logs from multiple Delta Lake objects, collate those into a central repository on the lakehouse , allow for more holistic analysis and allow presenting them through a single pane of glass dashboard for typical operational analytics. This simplifies the process for users looking to gain insights into their Delta Lakehouse table operations.
Q. What typical operational insights would I get from the solution ?
DeltaOMS centralized repository provides interfaces for custom analysis on the Delta Lake operational metrics using tools like Apache Spark, Databricks SQL etc.
For example, it could answer questions like :
Q. Who should use this feature ?
Data Engineering teams, Data Lake Admins and Operational Analysts would be able to manage and use this feature for operational insights on the Delta Lake.
Q Can I run this solution on non-Databricks environment ?
This project is distributed under Databricks license and cannot be used outside of Databricks environment
Q. How will I be charged ?
This solution is fully deployed in the users Databricks or Spark environment. The jobs for the framework will run on the execution environment.Depending on the configuration set by the users (for example, update frequency of the audit logs, number of databases/delta path enabled, number of transactions ingested etc.), the cost of the automated jobs and associated storage cost will vary.
We ran few simple ingestion benchmarks on an AWS based Databricks cluster :
Xtra Small | Small | Medium | Large | |
---|---|---|---|---|
Initial Txns | 100000 | 87000 | 76400 | 27500 |
Avg Txns Size | ~1 Kb | ~500 Kb | ~1 MB | ~2.5 MB |
Approx Total Txns Size | ~100 Mb | ~44 GB | ~76 GB | ~70 GB |
Cluster Config- Workers- Driver- DB Runtime | (5) i3.2xl - 305 GB Mem , 40 Cores i3.xl - 61 GB Mem, 8 Cores DB Runtime - 11.2 | (5) i3.4xl - 610 GB Mem , 80 Cores i3.2xl - 61 GB Mem, 8 Cores DB Runtime - 11.2 | (5) i3.4xl - 610 GB Mem , 80 Cores i3.2xl - 61 GB Mem, 8 Cores DB Runtime - 11.2 | (5) i3.4xl - 610 GB Mem , 80 Cores i3.2xl - 61 GB Mem, 8 Cores DB Runtime - 11.2 |
Initial Raw Ingestion Time | ~15 mins | ~ 50 mins | ~ 60 mins | ~ 40 mins |
Incremental Additional Txns | 1000 | 1000 | 1000 | 1000 |
Incremental Raw Ingestion Time | ~ 1 min | ~ 2 min | ~ 3 min | ~ 3 mins |