Skip to main content

PURGE INDEXTABLE

Remove orphaned split files and old transaction logs.

Syntax

PURGE INDEXTABLE '<path>'
[OLDER THAN <n> DAYS|HOURS]
[TRANSACTION LOG RETENTION <n> DAYS|HOURS]
[DRY RUN]

Parameters

ParameterDescriptionDefault
OLDER THANRetention period for splits7 days
TRANSACTION LOG RETENTIONRetention for tx logs30 days
DRY RUNPreview without deletingdisabled

Examples

Preview Cleanup

PURGE INDEXTABLE 's3://bucket/my_index' DRY RUN;

Standard Cleanup

PURGE INDEXTABLE 's3://bucket/my_index'
OLDER THAN 7 DAYS;

With Transaction Log Retention

PURGE INDEXTABLE 's3://bucket/my_index'
OLDER THAN 7 DAYS
TRANSACTION LOG RETENTION 30 DAYS;

Configuration

spark.conf.set("spark.indextables.purge.defaultRetentionHours", "168")  // 7 days
spark.conf.set("spark.indextables.purge.minRetentionHours", "24") // Safety
spark.conf.set("spark.indextables.purge.parallelism", "8")
spark.conf.set("spark.indextables.purge.maxFilesToDelete", "1000000")

Safety Features

  • Minimum 24-hour retention enforced
  • DRY RUN mode previews before deletion
  • LEFT ANTI JOIN ensures only orphaned files deleted
  • Retry logic for transient cloud errors

When to Use

  • After failed writes leaving orphaned files
  • After MERGE SPLITS operations
  • Regular maintenance (weekly/monthly)
  • Before archiving tables