Skip to content

Migrate to Data Model v2

Page Topics:


What is Data Model v2?

The new data model introduces significant enhancements designed to streamline your ETL processes and accelerate your journey to insights.

Key Benefits:

  • Faster Data Uploads: Instant data availability after upload.
  • Enhanced Data Validation: Ensures data integrity with schema validation and check sums.
  • Enhanced Flexibility: More robust and adaptable lake table schemas.
  • Optimized Storage: Data is stored as Parquet files instead of CSV, allowing schema specification from the source and significantly faster uploads.
  • Improved ETL Processes: Simplified migration and better error handling.
  • Greater Schema Control: Manage lake table schema directly from source data.
  • Increased Security: No need for API keys, verification based on user setup.
  • Backward Compatibility: Supports legacy ETL processes with minimal changes.

This guide will help you manage your migration to the new model efficiently using our Python SDK. Basic knowledge of Python and package installation via pip is required.

For any questions or support during the migration process, please contact us at dash@comotion.co.za.

V1 Lake Deprecation

Note that the v1 data lake will be deprecated on 1 December 2025, after which you will automatically be upgraded. This may result in breaking changes to some ETL jobs and insights tables.

Please reach out to Comotion via e-mail at dash@comotion.co.za so that we may assist you during this transition.


Step 1: Get Set-up

If you haven't installed the Comotion SDK yet, you will have to run the following in your terminal:

pip install comotion-sdk

If you have installed the sdk, remember to update the package to get the latest capability available.

pip install --upgrade comotion-sdk

Authenticating for the migration

Once the sdk is installed, you will need to store an authentication token in your local environment to interact with the Comotion API.

Run the following in your terminal if you are receiving authentication error messages:

comotion authenticate

You wil then be prompted to provide your Dash org name and log in to Dash in your browser. Once complete you will be ready to migrate.


Step 2: Run a Flash Schema migration

First run a flash schema migration. This will copy 1 row from each table in the v1 lake schema orgname-lake to the v2 lake schema orgname_lake_v2.

from comotion.dash import DashConfig, Migration
from comotion.auth import Auth

auth = Auth(orgname = 'my_orgname')
config = DashConfig(auth = auth)

migration = Migration(config = config)
migration.start(migration_type = 'FLASH_SCHEMA')
comotion -o org_name dash start-migration --flash-schema

Step 3: Test and Update ETL

You will now be able to test your ETL (upload) process in the new lake until you are ready to do the full migration. Here we will highlight some potential changes you will need to make if you are using the python SDK to upload to Dash. See the Load Data section for more information if you are using the API directly to upload to the lake.

The standard way to upload to Dash in the v1 lake was by using the read_and_upload_file_to_dash function. This function will still work for the v2 lake, but we have added an argument called data_model_version. Compare an upload script for Data Model v1 which is then re-purposed for Data Model v2 below.

from comotion.dash import read_and_upload_file_to_dash
from datetime import datetime

# In this example, we use the modify_lambda argument to add a column to the end of the lake table to mark the timestamp when the upload was started
def add_upload_timestamp(df):
    upload_timestamp = datetime.now()
    df['upload_timestamp'] = upload_timestamp.strftime('%Y-%m-%d %H:%M:%S')

upload_response = read_and_upload_file_to_dash(
    file = 'path/to/file.csv',
    dash_table = 'my_lake_table_name',
    dash_orgname = 'my_dash_orgname',
    dash_api_key = 'my_api_key',
    modify_lambda = add_upload_timestamp # Function defined above
)
from comotion.dash import read_and_upload_file_to_dash
from datetime import datetime

# In this example, we use the modify_lambda argument to add a column to the end of the lake table to mark the timestamp when the upload was started
def add_upload_timestamp(df):
    upload_timestamp = datetime.now()
    df['upload_timestamp'] = upload_timestamp.strftime('%Y-%m-%d %H:%M:%S')
    df['data_import_batch'] = upload_timestamp.strftime('%Y-%m-%d') # First change: Add the data import batch column.

upload_response = read_and_upload_file_to_dash(
    file = 'path/to/file.csv',
    dash_table = 'my_lake_table_name',
    dash_orgname = 'my_dash_orgname',
    # API key is no longer required for the upload
    modify_lambda = add_upload_timestamp, # Function defined above
    data_model_version = 'v2' # Add this parameter to force an attempted upload to the new lake
)

With very minor changes, now we can upload to the new lake! We will breakdown the key changes and why the are required.

Add the data_import_batch

This is the most important breaking change! The data_import_batch column is no longer automatically generated when data is uploaded to the v1 lake, as this has been replaced by the dash$load_id. You are likely using the data_import_batch column in your SQL queries, so you should add this field using the modify lambda parameter as demonstrated above. Alternatively, Comotion can assist with a switch to using the dash$load_id column in your SQL queries easily.

API key is no longer required

Authentication for uploads to the v2 data lake uses the Auth class in the python SDK instead of the api key your were provided for uploads to the v1 lake.

Specify the data_model_version

You can populate the data_model_version argument to specify which version of the lake you are uploading to. The intention is that this only needs to be used for testing purposes. After the full migration is complete you do not need to specify this argument, and you will no longer be able to upload data to the v1 data lake.

What is in the upload_response?

Previously, the upload_response would have returned the API response from the upload. Now, it will return a DashBulkUploader object. This is the a class designed for uploading to the v2 data lake. For more information, refer to the SDK documentation.

read_and_upload_file_to_dash Deprecation

Comotion will no longer be supporting use of the read_and_upload_file_to_dash function.

Once you are operating in the new lake, we recommend you start trying to move to using the new Load and DashBulkUploader class for uploads. Reach out to us for assistance! Or check out the Comotion SDK documentation overview for more information.


Step 4: Run the Full Migration

Running the full migration is also simple with the sdk:

from comotion.dash import DashConfig, Migration
from comotion.auth import Auth

auth = Auth(orgname = 'my_orgname')
config = DashConfig(auth = auth)

migration = Migration(config = config)

# Start the full migration
migration.start(migration_type = 'FULL_MIGRATION', clear_out_new_lake = True)

# Run this at any time to check on the state of the migration.
print(migration.status())
comotion -o org_name dash start-migration -c --full-migration


#`-c` is used to specify the new lake should be cleared out.  The migration will fail if this is not specified and data exists in the v2 lake already.

#You can run `comotion -o org_name migration-status` at any time to check the status of the migration.

The duration of your migration depends on how much data is in the v1 data lake. Your migration should complete well within 24 hours.