MERGE SPLITS
Consolidate small splits into larger ones for improved query performance.
Syntax
MERGE SPLITS '<path>'
[TARGET SIZE <size>]
[MAX DEST SPLITS <n>]
[MAX SOURCE SPLITS PER MERGE <n>]
[WHERE <partition_predicate>]
Parameters
| Parameter | Description | Default |
|---|---|---|
TARGET SIZE | Maximum size of merged splits | 5GB |
MAX DEST SPLITS | Limit destination splits processed | unlimited |
MAX SOURCE SPLITS PER MERGE | Max source splits per merge | 1000 |
WHERE | Partition filter predicate | all partitions |
Examples
Basic Merge
MERGE SPLITS 's3://bucket/my_index';
With Target Size
MERGE SPLITS 's3://bucket/my_index' TARGET SIZE 4G;
Partition-Specific
MERGE SPLITS 's3://bucket/my_index'
WHERE date = '2024-01-01'
TARGET SIZE 500M;
Limit Scope
MERGE SPLITS 's3://bucket/my_index'
TARGET SIZE 4G
MAX DEST SPLITS 10
MAX SOURCE SPLITS PER MERGE 100;
Configuration
spark.conf.set("spark.indextables.merge.maxSourceSplitsPerMerge", "1000")
Temp Directory Fallback
When spark.indextables.merge.tempDirectoryPath points to an invalid or inaccessible path, MERGE SPLITS automatically falls back to the JVM system temp directory instead of failing.
Companion Mode
MERGE SPLITS works with companion mode splits and preserves companion metadata during the merge:
companionSourceFiles— concatenated from all source splitscompanionDeltaVersion/source_version— the maximum value is retainedcompanionFastFieldMode— preserved (must be consistent across merged splits)
No special syntax is needed — companion metadata is handled automatically.
When to Use
- After many small writes
- During maintenance windows
- Before large analytical queries
- To optimize storage costs
Related
- PURGE INDEXTABLE - Clean up after merges
- Merge-On-Write - Automatic merging