MERGE SPLITS
Consolidate small splits into larger ones for improved query performance.
Syntax
MERGE SPLITS '<path>'
[TARGET SIZE <size>]
[MAX DEST SPLITS <n>]
[MAX SOURCE SPLITS PER MERGE <n>]
[WHERE <partition_predicate>]
Parameters
| Parameter | Description | Default |
|---|---|---|
TARGET SIZE | Maximum size of merged splits | 5GB |
MAX DEST SPLITS | Limit destination splits processed | unlimited |
MAX SOURCE SPLITS PER MERGE | Max source splits per merge | 1000 |
WHERE | Partition filter predicate | all partitions |
Examples
Basic Merge
MERGE SPLITS 's3://bucket/my_index';
With Target Size
MERGE SPLITS 's3://bucket/my_index' TARGET SIZE 4G;
Partition-Specific
MERGE SPLITS 's3://bucket/my_index'
WHERE date = '2024-01-01'
TARGET SIZE 500M;
Limit Scope
MERGE SPLITS 's3://bucket/my_index'
TARGET SIZE 4G
MAX DEST SPLITS 10
MAX SOURCE SPLITS PER MERGE 100;
Configuration
spark.conf.set("spark.indextables.merge.maxSourceSplitsPerMerge", "1000")
When to Use
- After many small writes
- During maintenance windows
- Before large analytical queries
- To optimize storage costs
Related
- PURGE INDEXTABLE - Clean up after merges
- Merge-On-Write - Automatic merging