IndexTables 0.5.5 — Native Rust Transaction Log, DataSource Short Name, and FFI Profiler

April 5, 2026 · 3 min read

IndexTables 0.5.5 continues the shift toward a fully native execution stack.

The headline change is a complete reimplementation of the transaction log in Rust, replacing the previous Scala-based design with a native module built on Arrow FFI. This release also introduces a DataSource short name and adds built-in profiling for the FFI read path.

Native Rust Transaction Log

The transaction log—responsible for index state, checkpoints, and coordinating concurrent writes—has been rebuilt entirely in Rust as part of the tantivy4java native layer.

This new NativeTransactionLog replaces both ScalaTransactionLog and OptimizedTransactionLog. All I/O now flows through the Arrow C Data Interface (FFI), enabling zero-copy data sharing between the JVM and Rust with no serialization overhead.

Moving this layer into Rust unlocks a set of capabilities that were previously difficult to implement cleanly:

Optimistic concurrency with automatic retry — write conflicts are resolved natively without JVM round-trips
Native LRU cache with TTL — hot log entries stay in memory, reducing object storage reads
GZIP compression — checkpoints and manifests are compressed before write
Auto-checkpointing — periodic checkpoints bound read amplification
Graceful fallback — when no checkpoint exists, the log transparently falls back to version scanning

This change is fully transparent—existing indexes continue to work without migration.

DataSource Short Name

IndexTables can now be referenced using the short name "indextables" when working with Spark.

spark.read.format("indextables").load("s3://bucket/events_index")

SELECT * FROM indextables.`s3://bucket/events_index`

FFI Read Path Profiler

This release introduces built-in profiling for the native FFI read path.

The profiler exposes timing and cache metrics directly through SQL, making it possible to understand where time is spent during indexed reads—without external tooling.

Enable or disable profiling:

ENABLE INDEXTABLES PROFILER
DISABLE INDEXTABLES PROFILER

Profiling is distributed—enabling it on the driver activates it across all executors.

Inspect timing metrics:

DESCRIBE INDEXTABLES PROFILER

Returns per-section metrics: calls, total_ms, avg_us, min_us, max_us.

Inspect cache metrics:

DESCRIBE INDEXTABLES PROFILER CACHE

Returns cache hits, misses, and hit_rate per executor.

Reset counters:

RESET INDEXTABLES PROFILER
RESET INDEXTABLES PROFILER CACHE

RESET reads and atomically clears counters, making it safe for measure–then–reset workflows.

CIDR Notation for IP Address Fields

IP address fields now accept CIDR notation and wildcard patterns directly, expanded transparently at the native layer. No special configuration is required — pass the CIDR string wherever you would pass an IP.

// Match an entire subnet
df.filter($"client_ip" === "192.168.1.0/24")

// Multiple subnets with IN
df.filter($"client_ip".isin("10.0.0.0/8", "192.168.1.0/24"))

// IndexQuery with CIDR and boolean logic
df.filter($"client_ip" indexquery "10.0.0.0/8 AND NOT 10.0.1.0/24")

Wildcard patterns are also supported:

df.filter($"client_ip" === "192.168.1.*")   // equivalent to /24
df.filter($"client_ip" === "10.0.*.*")      // equivalent to /16

IPv6 CIDR works the same way in DataFrame filters. In IndexQuery, quote the value to avoid the colon being parsed as a field separator:

df.filter($"client_ip" indexquery "\"2001:db8::/32\"")

See IP Address Fields for the full reference.

Additional Changes

tantivy4java 0.34.4 — performance and stability improvements in the native layer
Range bucket aggregation fix — correct results when combining range buckets with nested GROUP BY
Iceberg DATE partition handling — fixed partition value conversion for DATE columns
Iceberg file:// path handling — corrected local URI resolution in companion operations

Get Started

<dependency>
  <groupId>io.indextables</groupId>
  <artifactId>indextables_spark</artifactId>
  <version>0.5.5_spark_3.5.3</version>
  <classifier>linux-x86_64-shaded</classifier>
</dependency>

For installation details, see the Installation guide.
For the full change list, see the GitHub release notes.

Native Rust Transaction Log​

DataSource Short Name​

FFI Read Path Profiler​

CIDR Notation for IP Address Fields​

Additional Changes​

Get Started​