Why IndexTables
The Data Lakehouse Revolution
In 2019, the data world flipped upside down.
A new idea emerged — the data lakehouse — combining the openness of data lakes with the performance of data warehouses.
It wasn't just an architecture. It was a revolution.
Data stopped belonging to vendors. It started belonging to you.
For the first time, teams could choose the right tools — based on innovation, cost, and skill fit — not lock-in. Vendors had to compete on merit, not monopoly.
Search Missed the Revolution
But one domain missed the revolution: search.
Observability and security search stacks are still dominated by closed, expensive ecosystems. You're locked into:
- Proprietary formats that only work with one vendor
- Server infrastructure that you have to manage and scale
- Licensing costs that grow with your data
- Vendor roadmaps that may not align with your needs
IndexTables: Open Search for the Lakehouse Era
IndexTables brings that same open revolution to search — with performance that rivals the biggest proprietary platforms, built entirely on open tech.
Built on Spark
IndexTables runs as a native Spark DataSource V2 — the same interface you use for Delta Lake, Iceberg, and Parquet. No separate cluster. No new infrastructure. Just add the library to your existing Spark environment.
# Write an index
df.write.format("io.indextables.spark.core.IndexTables4SparkTableProvider") \
.option("spark.indextables.indexing.typemap.content", "text") \
.save("s3://bucket/logs")
# Read and query with SQL
logs = spark.read.format("io.indextables.spark.core.IndexTables4SparkTableProvider") \
.load("s3://bucket/logs")
logs.createOrReplaceTempView("logs")
spark.sql("SELECT * FROM logs WHERE content indexquery 'error AND timeout'")
Powered by Tantivy and Quickwit
IndexTables is built on Tantivy and Quickwit — Rust-based search technology that delivers Lucene-class performance with modern, memory-safe code.
Open Format
The QuickwitSplit format is documented and open. Your indexes are stored in standard object storage (S3, Azure Blob). No proprietary lock-in.
Community Driven
IndexTables is open source. You can inspect the code, contribute features, and shape the roadmap.
Who Is This For?
IndexTables is built for security teams and log analytics/observability use cases — but it's useful for anyone who needs very fast interactive query performance over very large data.
The Bottom Line
| Traditional Search | IndexTables |
|---|---|
| Separate cluster to manage | Runs in your Spark executors |
| Proprietary format | Open QuickwitSplit format |
| Per-GB licensing | Free and open source |
| Vendor lock-in | Your data, your choice |
It's your data. Your performance. Your choice.