Azure Data Lake Gen2

@mrpaulandrew
Jan 17, 2024
1 min read

Updated: May 23, 2024

Our Snippets Of Knowledge Weekly Blog Series

Storage Accounts have always been a tier zero Resource for any Azure Region, meaning they must be available from the point of the region's inception. However, the ability to be a Data Lake requires the storage account to support hierarchical namespaces, offering atomic directory manipulation, not just Blobs. Originally in the first generation (Azure Data Lake Gen1) of the Resource this was enabled by default as a standalone offering. Now becoming a Data Lake simply requires an additional feature enabled on the base Azure Storage Account at deployment time. In all cases the storage is highly scalable using the Hadoop Distributed File System (HDFS), plus the ability to replicate content locally, across zones and geographically.

With further advancements in the way data is structured using formats such as Parquet and Avro it means we can do even more with less reads/writes for operational and analytical workloads.

Finally, if adopting the open-source standard of Delta Lake, full ACID resilient, transactional, schema defined entities can be created in the storage layer while remaining decoupled from compute resources. A standard that can be used across multiple cloud storage products, not just Microsoft Azure and read/written up by almost any compute.

See MS Learn for more information on this Resource here.

We hope you found this knowledge snippet helpful.

Check out all our posts in this series here.