Z-Ordering
A multidimensional clustering technique that organizes data files to optimize query filtering on multiple columns.
Z-Ordering is a multidimensional data clustering technique that maps multi-column values to a one-dimensional space-filling curve. It is used in analytical databases to structure physical data layout, maximizing file-skipping performance across multiple search dimensions.
How it Works
Unlike linear sorting which treats the first sort key with higher priority, Z-ordering treats all designated columns with equal weight.
- Interleaving Bits: Interleaves binary representation values of chosen columns (such as
regionandproduct_category) to assign a single coordinates indicator (a Z-value) to each row. - Locality Clustering: Arranges records sequentially based on their Z-values, ensuring that rows with similar values across all chosen dimensions are grouped into the same physical data files.
- File Skipping: Enables query engines to evaluate column bounds in metadata, skipping files that do not overlap with search criteria on any of the Z-ordered fields.
Lakehouse & Agentic Relevance
In a data lakehouse, query workloads are diverse, and query patterns can filter on multiple different columns. Z-ordering Apache Iceberg tables enables optimal file skipping for these varying queries. When an autonomous agent constructs SQL filters dynamically, Z-ordering ensures that the execution engine is not forced to run full scans. Dremio leverages Z-ordered Iceberg tables, planning and running multi-dimensional range filters at sub-second speeds.