Skip to main content

MERGE SPLITS

Consolidate small splits into larger ones for improved query performance.

Syntax

MERGE SPLITS '<path>'
[TARGET SIZE <size>]
[MAX DEST SPLITS <n>]
[MAX SOURCE SPLITS PER MERGE <n>]
[WHERE <partition_predicate>]

Parameters

ParameterDescriptionDefault
TARGET SIZEMaximum size of merged splits5GB
MAX DEST SPLITSLimit destination splits processedunlimited
MAX SOURCE SPLITS PER MERGEMax source splits per merge1000
WHEREPartition filter predicateall partitions

Examples

Basic Merge

MERGE SPLITS 's3://bucket/my_index';

With Target Size

MERGE SPLITS 's3://bucket/my_index' TARGET SIZE 4G;

Partition-Specific

MERGE SPLITS 's3://bucket/my_index'
WHERE date = '2024-01-01'
TARGET SIZE 500M;

Limit Scope

MERGE SPLITS 's3://bucket/my_index'
TARGET SIZE 4G
MAX DEST SPLITS 10
MAX SOURCE SPLITS PER MERGE 100;

Configuration

spark.conf.set("spark.indextables.merge.maxSourceSplitsPerMerge", "1000")

When to Use

  • After many small writes
  • During maintenance windows
  • Before large analytical queries
  • To optimize storage costs