Unpacking Parquet: Explicit SIMD, Scalar Baselines, and What HotSpot Makes of Them
On the JVM, optimizing a hot kernel is not only about writing faster code: it is also about understanding how much the result depends on the machine code HotSpot derives from the scalar loop. Using Parquet bit-unpacking as a concrete case, the piece shows that a SIMD speedup depends on which scalar baseline C2 is handed, when explicit vectorization is actually justified, and why a more specialized scalar routine is not necessarily faster.
Where Spark Changes Shape: UnsafeRow, the JVM, and the Relocation of Portability
Run df.explain on a Parquet scan and you find ColumnarToRow, an operator whose only job is to change the data’s shape. It is the seam where two architectural eras meet, and a record of how portability in analytical systems has relocated from the JVM runtime to the data format and the query plan.
The Hidden DSL in Catalyst
How Spark’s internal rewriting framework, Catalyst, exposes an embedded DSL with a public extension surface, the same one Delta Lake and Iceberg use to plug into the optimizer pipeline.
delta-explain: Making Delta Lake Pruning Visible
Partition pruning and data skipping are invisible by default. delta-explain reads the Delta log directly and shows, step by step, how a WHERE predicate narrows down candidate files, with no engine required.
Anti-patterns in Catalyst rules
Six concrete anti-patterns I encountered building a real Catalyst extension: from the wrong rule type for throws, to mutable state under AQE and Spark Connect, to JVM bootstrap traps in PySpark.
The Disaggregation of the Lakehouse Stack
How Delta Kernel, Arrow, and pluggable execution are disaggregating the lakehouse stack. The lakehouse stack is not converging on a new dominant engine — it is converging on a layered architecture in which protocol, data representation, and query execution are increasingly isolated behind stable interfaces.