IndexTables 0.5.5 — Native Rust Transaction Log, DataSource Short Name, and FFI Profiler
IndexTables 0.5.5 continues the shift toward a fully native execution stack.
The headline change is a complete reimplementation of the transaction log in Rust, replacing the previous Scala-based design with a native module built on Arrow FFI. This release also introduces a DataSource short name and adds built-in profiling for the FFI read path.
Native Rust Transaction Log
The transaction log—responsible for index state, checkpoints, and coordinating concurrent writes—has been rebuilt entirely in Rust as part of the tantivy4java native layer.
This new NativeTransactionLog replaces both ScalaTransactionLog and OptimizedTransactionLog. All I/O now flows through the Arrow C Data Interface (FFI), enabling zero-copy data sharing between the JVM and Rust with no serialization overhead.
Moving this layer into Rust unlocks a set of capabilities that were previously difficult to implement cleanly:
- Optimistic concurrency with automatic retry — write conflicts are resolved natively without JVM round-trips
- Native LRU cache with TTL — hot log entries stay in memory, reducing object storage reads
- GZIP compression — checkpoints and manifests are compressed before write
- Auto-checkpointing — periodic checkpoints bound read amplification
- Graceful fallback — when no checkpoint exists, the log transparently falls back to version scanning
This change is fully transparent—existing indexes continue to work without migration.
DataSource Short Name
IndexTables can now be referenced using the short name "indextables" when working with Spark.
spark.read.format("indextables").load("s3://bucket/events_index")
SELECT * FROM indextables.`s3://bucket/events_index`
FFI Read Path Profiler
This release introduces built-in profiling for the native FFI read path.
The profiler exposes timing and cache metrics directly through SQL, making it possible to understand where time is spent during indexed reads—without external tooling.
Enable or disable profiling:
ENABLE INDEXTABLES PROFILER
DISABLE INDEXTABLES PROFILER
Profiling is distributed—enabling it on the driver activates it across all executors.
Inspect timing metrics:
DESCRIBE INDEXTABLES PROFILER
Returns per-section metrics: calls, total_ms, avg_us, min_us, max_us.
Inspect cache metrics:
DESCRIBE INDEXTABLES PROFILER CACHE
Returns cache hits, misses, and hit_rate per executor.
Reset counters:
RESET INDEXTABLES PROFILER
RESET INDEXTABLES PROFILER CACHE
RESET reads and atomically clears counters, making it safe for measure–then–reset workflows.
CIDR Notation for IP Address Fields
IP address fields now accept CIDR notation and wildcard patterns directly, expanded transparently at the native layer. No special configuration is required — pass the CIDR string wherever you would pass an IP.
// Match an entire subnet
df.filter($"client_ip" === "192.168.1.0/24")
// Multiple subnets with IN
df.filter($"client_ip".isin("10.0.0.0/8", "192.168.1.0/24"))
// IndexQuery with CIDR and boolean logic
df.filter($"client_ip" indexquery "10.0.0.0/8 AND NOT 10.0.1.0/24")
Wildcard patterns are also supported:
df.filter($"client_ip" === "192.168.1.*") // equivalent to /24
df.filter($"client_ip" === "10.0.*.*") // equivalent to /16
IPv6 CIDR works the same way in DataFrame filters. In IndexQuery, quote the value to avoid the colon being parsed as a field separator:
df.filter($"client_ip" indexquery "\"2001:db8::/32\"")
See IP Address Fields for the full reference.
Additional Changes
- tantivy4java 0.34.4 — performance and stability improvements in the native layer
- Range bucket aggregation fix — correct results when combining range buckets with nested
GROUP BY - Iceberg DATE partition handling — fixed partition value conversion for
DATEcolumns - Iceberg
file://path handling — corrected local URI resolution in companion operations
Get Started
<dependency>
<groupId>io.indextables</groupId>
<artifactId>indextables_spark</artifactId>
<version>0.5.5_spark_3.5.3</version>
<classifier>linux-x86_64-shaded</classifier>
</dependency>
For installation details, see the Installation guide.
For the full change list, see the GitHub release notes.
