Split Architecture
IndexTables uses a split-based architecture optimized for cloud object storage.
What is a Split?
A split is a self-contained index segment stored as a single file (.split) in object storage. Each split contains:
- Inverted index data
- Document store
- Fast fields (columnar data)
- Metadata
Why Splits?
Traditional search engines use many small files per index segment. This works well for local SSDs but is inefficient for object storage like S3 where:
- Each file requires a separate HTTP request
- Small files waste storage (minimum object sizes)
- Listing operations are expensive
The QuickwitSplit format consolidates everything into a single file, optimizing for cloud storage patterns.
Split Lifecycle
- Write: Data is indexed in memory
- Create: Index is serialized to QuickwitSplit format
- Upload: Split file is uploaded to object storage
- Commit: Transaction log records the new split
- Query: Split is now visible to readers
Split Files
Splits are stored with UUID-based names:
s3://bucket/my_index/
_transaction_log/
00000000000000000001.json
00000000000000000002.json
partition=2024-01-01/
abc123-def456-789.split
xyz789-abc123-456.split
Maintenance
Over time, many small splits accumulate. Use MERGE SPLITS to consolidate:
MERGE SPLITS 's3://bucket/my_index' TARGET SIZE 4G;
See MERGE SPLITS for details.