top of page

Fabric - Materialized Lake Views in your pipelines

  • Writer: Jon Lunn
    Jon Lunn
  • 3 minutes ago
  • 5 min read

Since the announcement of Materialised Lake Views for Microsoft Fabric, I've finally had some time to have a good look at them, what they are, were they could fit in our ETL process and, as a bonus, I've used them in a project recently. Don't want to read? Check out the YouTube version:

What are Materialised Lake Views?

So first off, what are they? They are basically a table object that is based on a query. (Yes I know they are called 'Views'… more on that later) So like a view, it is defined by a SQL query, but it doesn't just sit over tables and runs that SQL when you query that view. What it does do is take the 'View' SQL query that defines the Materialised Lake Views, runs it and stores the query result data into a delta table. So when you query that view, you get the data from that object, and not the underlying tables. Neat! Save a bit on computing query time!


Why do we want to do stuff this way? Well the aim of the Materialised Lake Views, at least from the MS Learn page is to do the following:


  • Help the implementation of a medallion architecture to enhance data management

  • Developers can concentrate on generating insights derived from data


That seems like a fine goal, but how does it achieve that? In the lakehouse, rather than having a query stored in a note book that gets run time and time again, we can push some stuff back to the lakehouse that can just be queried, no notebook or cluster start up time. So for example these Materialised Lake Views could be used in the following way:


  • In a Cleaned layer of a lakehouse, conforming names or data types

  • In an Enrichment layer of the lakehouse to bring tables together, or splitting out columns etc

  • In a Metrics layer, flatting out data from the lakehouse Dims and Facts for a more report focused object


The good thing about them is that the data source can be from tables in the lakehouse, but also from other Materialised Lake Views, which feeds into another benefit, which allows you to build declarative pipelines, without having to build an actual regular Data Pipeline. If you are familiar with the Azure Databricks feature Delta Live Tables (DLT) it is sort of like that.


That seems to fit the goals, you can just have an object that just be queried by report developers to get insights, and also on the management side since you don’t have to worry about clusters, capacity issues and logic in notebooks. What I would not use them for is queries for slowly changing dimensions, or anything that have complex logic in them. Keep them nice and simple as possible, and if you need to do something complex. Like query data across lakehouse's, or staging of data in CTE's, keep that stuff in a notebook. Materialised Lake Views do support CTE's but at the time of writing this, those queries with CTE's can break. So they are quite basic in what they can do, it is just selects. So for anything like a type 2 dimension you may struggle with.


Let's take a look at creating one…

This is the basic structure of a MLV here:


Like a view, you create the query that will get the data, and like a table you can set the way the delta object partitions the data. Like normal delta tables, you can use Alter, Drop, Show on Materialised Lake Views to manage the life cycle of these objects. So you run the code and it creates the object, and even though they are called Materialised Lake Views, you find them under the table section.

 


  


This can now be queried using a notebook, or the SQL Endpoint

  


Armed with your new MLV, and you have found a great place to implement them in your lakehouse, then what? Is that it?

No, you still need to manage them, but why? They are just a view aren’t they? Yes, but in name only, they are a table, and when you query the MLV it doesn't go and get the latest data from those underlying tables, you have to refresh them. This can be done in one of two ways, using a Spark SQL command, or the Materialised View interface in Fabric.

 

Spark SQL is easy enough


 

Or using PySpark, but it's not directly supported you have to wrap it in a spark.sql() statement

 

 

You can add FULL at the end of that statement, to fully refresh the view, but they can detect the underlying changes in the base tables and do either a FULL or PARTIAL update.

 

The other option is to use the  Materialised View interface in Fabric. The allows you to schedule the refresh of them.

 

 

 This shows the sources and the MLV(s) they feed into. 


Now an important point, when you schedule the reprocessing, it will refresh all of the MLV's in that lakehouse. ML's are scoped at the lakehouse level only! The other important point is that here is no way to have separate processing of items, so no distinct line of processing for selected items. They are smart enough to skip items that require no processing, however so there is that benifit.  Just to note, I can't see any way of error handling or alerting. What happens if a source object is deleted? Or the MLV jobs errors due to data issues?

This scheduled job is its own thing, it doesn't integrate with the normal Fabric Pipeline tasks directly, however you could add a Fabric API request to fire off the MLV Job, and then keep polling it with another task to see when it is finished, and maybe refresh a dependant sematic model. Might be a good case for an Fabric User Data Function.


 

What is the use of Materialised Lake Views?


There is something to like about them, but the MLV seems a bit disconnected from your normal ETL pipelines. You can't refresh them directly from your normal Data Pipeline experience, and you can't call the Declarative Pipeline from it as well. It has its own little pipeline interface.

I know the blurb from MS says it will help build your medallion data lake etc, but it seems it more suited to the analyst or citizen developer, so they can build out a data flow from pre-existing items from your lake house for something a bit more report specific. If running via a notebook, items will have to be refreshed in a linear process, one after the other, so orchestration of items is going to be tricky, and you'll have cluster start up time to add into the processing time.

 

The other thing is, I don’t like the name Materialised Lake View I don’t think it fits well with what they are and how they work. They are tables, found under the tables in the Lakehouse, they need to be refreshed like tables. Are they the Jaffa Cakes of the Fabric Data Engineering experience (They are cakes for tax purposes, but sold and presented like biscuits), Materialised Lake Queries would have be a more accurate name, in my humble opinion.


At the moment I have one combining a few tables to bring together items for a sematic model, so its called near the end of my normal process, so it's not a major part of the ETL process, but until it can be properly orchestrated, I'll use them only if there no other alternative. But you might find them useful in smaller projects, or simple parts of the process. I mentioned earlier that they will struggle with some tasks like type 2 dimensions.

As they are in preview (As of Dec 2025) developers beware as things can break or change until general release.

 
 
 

Comments


Be the first to know

Subscribe to our blog to get updates on new posts.

Thanks for subscribing!

TRANSFORM YOUR BUSINESS

bottom of page