Fabric Runtime 2.0 - Identity Columns

Jon Lunn
Jun 9
2 min read

Fabric Runtime 2.0 is in public preview, and there is one small change for this that makes my life as a data engineer a little bit easier. Identity columns. I miss the good old T-SQL Identity column "Id INT IDENTITY(1,1)", now we have something like it in Spark/Delta.

So with the the old spark/delta table runtime, you couldn't have an default, automatically increasing column. But now we have these updates.

Apache Spark 4.1
Delta Lake: 4.1

One of the challenges I've had when building dimensions in a Lakehouse is generating surrogate keys I've had to use ROW_NUMBER(), and getting from the table I'm about to insert into the MAX(Id) value then using MAX(Id) + ROW_NUMBER(), to create new values. It was always a annoying pattern to use.

So now we can use 'generatedAlwaysAs=IdentityGenerator()'

Let's see an basic example:

from delta.tables import DeltaTable, IdentityGenerator

spark.sql("DROP TABLE IF EXISTS identity_demo")

# Create table
DeltaTable.create(spark) \
    .tableName("identity_demo") \
    .addColumn(
        "Id",
        "BIGINT",
        generatedAlwaysAs=IdentityGenerator()
    ) \
    .addColumn("Name", "STRING") \
    .execute()

This is just a basic table nothing fancy with columns Id and Name.

Let's add some data, just names, no Id's:

spark.sql("""
INSERT INTO identity_demo (Name)
VALUES
    ('John'),
    ('Jane'),
    ('Bob')
""")

And lets see the results:

Woo! A proper Id column!

Also 'IdentityGenerator()' has options like the TSQL Identity statement, with start and increments.

generatedAlwaysAs=IdentityGenerator(start=1, step=1)

Will give an Id starting at 1, and incrementing by 1 each time.

This may be a small update, but makes data engineering life so much easier! Fabric Runtime 2.0 is in preview, so you may have to do some testing before making it the default on your workspace.