Skip to main content

MERGE SPLITS

Consolidate small splits into larger ones for improved query performance.

Syntax

MERGE SPLITS '<path>'
[TARGET SIZE <size>]
[MAX DEST SPLITS <n>]
[MAX SOURCE SPLITS PER MERGE <n>]
[WHERE <partition_predicate>]

Parameters

ParameterDescriptionDefault
TARGET SIZEMaximum size of merged splits5GB
MAX DEST SPLITSLimit destination splits processedunlimited
MAX SOURCE SPLITS PER MERGEMax source splits per merge1000
WHEREPartition filter predicateall partitions

Examples

Basic Merge

MERGE SPLITS 's3://bucket/my_index';

With Target Size

MERGE SPLITS 's3://bucket/my_index' TARGET SIZE 4G;

Partition-Specific

MERGE SPLITS 's3://bucket/my_index'
WHERE date = '2024-01-01'
TARGET SIZE 500M;

Limit Scope

MERGE SPLITS 's3://bucket/my_index'
TARGET SIZE 4G
MAX DEST SPLITS 10
MAX SOURCE SPLITS PER MERGE 100;

Configuration

spark.conf.set("spark.indextables.merge.maxSourceSplitsPerMerge", "1000")
Temp Directory Fallback

When spark.indextables.merge.tempDirectoryPath points to an invalid or inaccessible path, MERGE SPLITS automatically falls back to the JVM system temp directory instead of failing.

Companion Mode

MERGE SPLITS works with companion mode splits and preserves companion metadata during the merge:

  • companionSourceFiles — concatenated from all source splits
  • companionDeltaVersion / source_version — the maximum value is retained
  • companionFastFieldMode — preserved (must be consistent across merged splits)

No special syntax is needed — companion metadata is handled automatically.

When to Use

  • After many small writes
  • During maintenance windows
  • Before large analytical queries
  • To optimize storage costs