Data Model v2: Key Changes and Benefits¶
14 March 2025
Page Topics
Comotion has rolled out an improved version of the ComoDash data model, packed with enhancements to greatly reduce the difficulty and time to extract insights from your source data! Let's dive into the main changes and what they mean for you.
Key Benefits:
- Faster Data Uploads: Instant data availability after upload.
- Enhanced Data Validation: Ensures data integrity with schema validation and check sums.
- Improved ETL Processes: Simplified migration and better error handling.
- Greater Schema Control: Manage lake table schema directly from source data.
- Increased Security: No need for API keys, verification based on user setup.
- Backward Compatibility: Supports legacy ETL processes with minimal changes.
V1 Lake Deprecation
Note that the v1 data lake will be deprecated on 1 December 2025.
Please reach out to Comotion via e-mail at dash@comotion.co.za so that we may assist you during this transition.
Key Changes¶
Improved Uploads¶
Uploads are now instant and verified.
- Instant: Successfully uploaded data can be queried instantly, instead of waiting for the refresh the next day!
- Verified: Source data is now validated on upload to the lake table. If any of the critical checks do not pass, a data upload will fail to avoid breaking the existing lake table. This includes schema validation and check sums.
Data Migration¶
A data migration will be required to access the new data model. The python SDK makes migrating to the new data model straightforward with two types of migrations:
- Flash Schema Migration: This step copies a single line of data from each lake table to the new lake, letting you adjust and test your ETL processes without affecting your current setup.
- Full Migration: After verifying your ETL processes, this step copies all data from the old lake to the new one, ensuring a smooth transition.
Find out more here:
ETL Updates¶
You will need to update your ETL to accommodate the new data model. For those uploading with the python SDK, your transition should be relatively seamless. These sources will be helpful to you:
If you are uploading directly with the API, you can find more information on the load mechanism here:
Schema control with Parquet instead of CSV
The new data model requires data uploads in parquet format. By defining the schema within the parquet file, you can directly manage the lake table schema from your source data!
Note the python SDK handles this switch automatically, while parquet files will need to be generated in the ETL process for direct API calls.
Benefits for Users¶
Improved Performance¶
The new data model handles larger and more complex datasets more efficiently, leading to faster query responses and data uploads.
No more nightly runs¶
Your uploads are persisted to the lake almost immediately, greatly reducing the time required to iron out the kinks in your ETL process.
Enhanced Data Integrity¶
Features like the check sum ensure your data uploads are accurate and complete, reducing the risk of discrepancies. Your uploads are validated during the load process, so your load will fail rather than be persisted to the lake table incorrectly.
Greater Schema Control¶
Your lake table schemas are now much more manageable. You can specify the lake table schema using parquet functionality in the upload. The lake schema can now also be changed after the upload.
Improved ETL Processes¶
The migration tools and new classes in the SDK make it easier to transition to the new data model and manage your ETL processes. Receive immediate feedback on upload errors instead of waiting to identify issues after the lake refresh.
Security simplified and improved¶
No need to request an API key for uploads. Your verification is automatically done based on your Dash user setup, making your ETL code cleaner and more secure.
Backward Compatibility¶
We understand that transitioning to a new data model can be challenging. To ease this process, Data Model v2 offers backward compatibility features:
- Legacy Support: Existing ETL processes and scripts designed for Data Model v1 will continue to function with minimal changes if using the python SDK.
- Dual Mode Operation: You can run both Data Model v1 and v2 in parallel during the transition period, allowing you to validate the new model without disrupting your current operations.
- Comprehensive Documentation and Support: Detailed guides and examples are provided to help you adapt your existing workflows to the new data model. Reach out to us at dash@comotion.co.za for assistance.
Summary of Changes and Next Steps¶
Data Model v1 | Data Model v2 | |
---|---|---|
Push to Lake Table | Nightly refresh required | Instant, pre-verified uploads |
Upload Format | CSV upload | Parquet upload |
SDK Functions | Upload from local CSV | Upload from various sources supported by pandas, uses multi-threading to improve performance significantly |
Data Integrity | Higher risk of discrepancies | Check sums and schema validation on upload |
Schema Control | Limited - schema can break if invalid source data uploaded | Greater control from source with Parquet. Edit schema after upload. |
Security | API key required | Simplified, based on Dash user setup |
In a nutshell, migrating to Dash Model v2 brings significant improvements to data management and ETL processes. By leveraging these new features, you can expect better performance, enhanced data integrity, and a more streamlined workflow.
Set up a call with one of our skilled analysts by sending an e-mail to dash@comotion.co.za. They will be happy to answer any questions and assist with implementation. Or, get going yourself easily with the python SDK: Migrate to Data Model v2.
Check Sums
Check sums allow you to verify completeness of lake data in your ETL processes. Build up check sums while preparing data for the Dash lake. Include these in your upload transaction to ensure only complete data is uploaded to the data lake.