top of page

New Data Pipeline Lineage with Mermaid Diagrams in CF.Cumulus

  • Writer: Matt Collins
    Matt Collins
  • Aug 8
  • 3 min read

Reduce Complexity of your Data Pipelines through Visualisations!

Communicating Data Platform activities such as pipelines to business users can be complicated, especially for large organisations who rely on multiple processes running at high frequency. Tools like Purview and Unity Catalog help visualise and give insight through metadata about your data, but what about about at an orchestration level?


We've extended some work done back in the ProcFwk days, using Mermaid diagrams to visualise the Orchestration pipelines, highlighting steps in your data engineering processes. This has been modernised to fit CF.Cumulus and explicitly capture pipeline names, along with the status of the latest/current executions.


Once again, we've used T-SQL to query the underlying metadata and dynamically build the mermaid script to render our pictorial display of the platform. As we improve the user experience for support customers of CF.Cumulus, we're integrating this mermaid diagram into a front-end interface for users to track executions, understand failures and their downstream impacts.


A quick example of data lineage

Consider our demo environment example: We've got some CF.Cumulus pipelines that ingest data from AdventureWorks and re-model it into some business facing tables for reporting. Querying the underlying control Schema tables using our new [control].[GetOrchestrationLineage] allows us to display what executes, when, and any dependency chains we've set up.

EXEC [control].[GetOrchestrationLineage]
	@BatchName = 'Daily';

Linage Diagram

This gives us an understanding of the processes run, and the datasets they relate to.


Now we can toggle the @UseStatusColours parameter to help us track the current/latest status of the pipeline executions.

EXEC [control].[GetOrchestrationLineage]
	@BatchName = 'Daily',
	@UseStatusColours = 1;

Lineage diagram with status colours

We can see end-to-end visibility platform executions as a whole and trace failures to help streamline troubleshooting efforts.


Our demo example is pretty straight-forward with only 9 pipelines being run. It is likely that a real world example contains many more activities. To assist with the interpretability of this, the feature contains some filtering to help you focus on different use-cases.


Filtering on unsuccessful states

Lineage diagram filtered on unsuccessful states
EXEC [control].[GetOrchestrationLineage]
	@BatchName = 'Daily',
	@UseStatusColours = 1,
	@1_FilterFailedAndBlocked = 1;

Filtering on specific datasets

Lineage diagram for specific datasets

EXEC [control].[GetOrchestrationLineage]
	@BatchName = 'Daily',
	@UseStatusColours = 1,
	@2_FilterDataset = 1,
	@2_DatasetDetails = 
	'[
		{"name": "Product"},
		{"name": "SalesOrderHeader"}
	]';

Combining filters

Lineage diagram with combined filters applied

Other filters

By Data Source:

EXEC [control].[GetOrchestrationLineage]
	@BatchName = 'Daily',
	@UseStatusColours = 1,
	@3_FilterDataSource = 1,
	@3_DataSourceDetails = '[{"name": "AdventureWorksDemo"}]';

By Data Source Type

EXEC [control].[GetOrchestrationLineage]
	@BatchName = 'Daily',
	@UseStatusColours = 1,
	@3_FilterDataSourceByType = 1,
	@3_DataSourceDetails = '[{"name": "Azure SQL Database"}]';

Our demo pipelines all come from the same AdventureWorks Azure SQL DB, so we won't see any difference in the results here.


What's next

We've added the [control].[GetOrchestrationLineage] to our develop branch within our open-source repo, scheduled to be included in a future release.


There are a few ways to consume mermaid diagrams. If you only need a snapshot of the lineage, you can run the stored procedure manually and save it to a markdown friendly interface for consumers to use, such as Azure DevOps Wikis.


Having configured point-in-time information like the latest execution statuses, we are looking to take this one step further so that users can leverage this information on-demand.


What might a workflow for this look like? Let's use a simple mermaid visual to sketch out some ideas!


Process to capture updating the lineage diagram

There are also a custom Power BI Visual named "Markdown & Mermaid Visual", but unfortunately this does not allow for the markdown to be parameterised (e.g. from markdown saved in a text file saved in Azure blob storage).


What other features would you like to see in these orchestration lineage visualisations? Let us know in the comments below.


Thanks for reading!

Be the first to know

Subscribe to our blog to get updates on new posts.

Thanks for subscribing!

TRANSFORM YOUR BUSINESS

bottom of page