Skip to main content

DESCRIBE Commands

Monitor cache usage, storage statistics, and table information.

DESCRIBE DISK CACHE

View disk cache statistics across all executors:

DESCRIBE INDEXTABLES DISK CACHE;

Output

ColumnDescription
executor_idExecutor identifier
hostIP:port
enabledCache enabled status
total_bytesCurrent cache size
max_bytesMaximum cache size
usage_percentUsage percentage
splits_cachedNumber of splits cached
components_cachedNumber of components cached

DESCRIBE STORAGE STATS

View object storage access statistics:

DESCRIBE INDEXTABLES STORAGE STATS;

Output

ColumnDescription
executor_idExecutor identifier
hostIP:port
bytes_fetchedTotal bytes fetched from storage
requestsNumber of storage requests

DESCRIBE DATA SKIPPING STATS

View data skipping effectiveness and cache hit rates:

DESCRIBE INDEXTABLES DATA SKIPPING STATS;

Output

ColumnDescription
metric_typeCategory: data_skipping, filter_expr_cache, partition_filter_cache, filter_type_skips
metric_nameName of the metric
metric_valueMetric value

Metrics

Data Skipping:

  • total_files_considered - Files evaluated before pruning
  • partition_pruned_files - Files pruned by partition filters
  • data_skipped_files - Files pruned by min/max statistics
  • final_files_scanned - Files actually read
  • partition_skip_rate - Percentage of files skipped by partitions
  • data_skip_rate - Percentage of files skipped by statistics
  • total_skip_rate - Overall file skip rate

Filter Expression Cache:

  • simplified_hits/misses - Filter simplification cache stats
  • in_range_hits/misses - Range check cache stats
  • Hit rates and cache sizes

DESCRIBE STATE

View transaction log state format, version, and statistics:

DESCRIBE INDEXTABLES STATE 's3://bucket/my_index';

Output

ColumnDescription
formatState format: "avro" or "json"
versionCurrent state version
total_filesTotal active files in table
tombstone_countNumber of tombstone entries
tombstone_ratioTombstone percentage (triggers compaction at 10%)
manifest_countNumber of manifest files (Avro only)
protocol_versionTable protocol version
last_checkpointMost recent checkpoint version

Use this to monitor table health and determine if compaction or checkpoint is needed.

DESCRIBE COMPONENT SIZES

Analyze storage consumption at the index component level:

DESCRIBE INDEXTABLES COMPONENT SIZES 's3://bucket/my_index';

-- With partition filter for efficiency
DESCRIBE INDEXTABLES COMPONENT SIZES 's3://bucket/my_index'
WHERE date = '2024-01-15';

Output

ColumnDescription
split_pathSplit file path
partition_valuesPartition column values
componentComponent identifier
component_typeterm, postings, positions, store, fastfield, fieldnorm
size_bytesSize in bytes
field_nameAssociated field (when applicable)

Use Cases

  • Index size analysis: Identify which components consume the most storage
  • Schema optimization: Find fields that may benefit from different indexing strategies (e.g., switching from position to basic index record option)
  • Capacity planning: Estimate storage requirements based on component breakdowns
  • Debugging: Diagnose indexing issues by examining component-level details

Example Analysis

-- Find largest components by type
SELECT component_type, SUM(size_bytes) as total_bytes
FROM (
DESCRIBE INDEXTABLES COMPONENT SIZES 's3://bucket/logs'
)
GROUP BY component_type
ORDER BY total_bytes DESC;

DESCRIBE TRANSACTION LOG

View the contents of a table's transaction log:

-- View current state (from latest checkpoint forward)
DESCRIBE INDEXTABLES TRANSACTION LOG 's3://bucket/my_index';

-- View complete history from version 0
DESCRIBE INDEXTABLES TRANSACTION LOG 's3://bucket/my_index' INCLUDE ALL;

Output

Returns detailed information about all transaction log actions including:

  • version - Transaction log version number
  • action_type - ADD, REMOVE, SKIP, PROTOCOL, or METADATA
  • path - Split file path
  • partition_values - Partition column values
  • size - File size in bytes
  • num_records - Document count
  • min_values/max_values - Column statistics for data skipping
  • And many more fields for debugging and analysis

TABLE ROOT Commands

Manage named table roots for cross-region companion reads. See Multi-Region Table Roots for details.

SET TABLE ROOT

Register a named table root:

SET INDEXTABLES TABLE ROOT 'us-east' = 's3://us-east-replica/events'
FOR 's3://warehouse/events_index';

UNSET TABLE ROOT

Remove a named table root:

UNSET INDEXTABLES TABLE ROOT 'us-east'
FOR 's3://warehouse/events_index';

DESCRIBE TABLE ROOTS

List all registered table roots for a companion index:

DESCRIBE INDEXTABLES TABLE ROOTS 's3://warehouse/events_index';

Output

ColumnDescription
root_nameNamed table root identifier
root_pathStorage path for the root

DESCRIBE ENVIRONMENT

View Spark and Hadoop configuration across all executors:

DESCRIBE INDEXTABLES ENVIRONMENT;

Output

ColumnDescription
hostExecutor host:port
role"driver" or "worker"
property_type"spark" or "hadoop"
property_nameConfiguration property name
property_valueProperty value (sensitive values redacted)

Useful for debugging configuration issues across a cluster.

DESCRIBE PREWARM JOBS

View the status of async prewarm jobs across all executors:

DESCRIBE INDEXTABLES PREWARM JOBS;

Output

ColumnDescription
job_idUnique job identifier
executor_idExecutor running the job
hostExecutor hostname
table_pathPath being prewarmed
statuspending, running, completed, failed
splits_totalTotal splits to prewarm
splits_completedSplits prewarmed so far
progress_percentCompletion percentage
started_atJob start timestamp
completed_atJob completion timestamp (if finished)
error_messageError details (if failed)

This command provides cluster-wide visibility into all async prewarm operations, useful for monitoring long-running prewarm jobs or debugging failures.

WAIT FOR PREWARM JOBS

Block until all async prewarm jobs complete:

-- Wait indefinitely for all jobs to complete
WAIT FOR INDEXTABLES PREWARM JOBS;

-- Wait with a timeout (in seconds)
WAIT FOR INDEXTABLES PREWARM JOBS TIMEOUT 300;

Output

ColumnDescription
jobs_completedNumber of jobs that completed successfully
jobs_failedNumber of jobs that failed
total_splits_prewarmedTotal splits prewarmed across all jobs
total_duration_msTotal time waited in milliseconds
timed_outWhether the wait timed out

Use this command when you need to ensure prewarming is complete before running benchmarks or time-sensitive queries.

Examples

-- Start async prewarm, then wait before benchmark
PREWARM INDEXTABLES CACHE 's3://bucket/logs' ASYNC MODE;

-- Do other work...

-- Wait up to 10 minutes for prewarm to complete
WAIT FOR INDEXTABLES PREWARM JOBS TIMEOUT 600;

-- Now run your benchmark queries
SELECT COUNT(*) FROM indextables('s3://bucket/logs') WHERE status = 'error';

FLUSH Commands

FLUSH DISK CACHE

Clear the L2 disk cache across all executors:

FLUSH INDEXTABLES DISK CACHE;

Output

ColumnDescription
executor_idExecutor identifier
cache_typeType of cache flushed
statussuccess or error
bytes_freedBytes deleted
files_deletedFiles removed
messageStatus message

FLUSH SEARCHER CACHE

Clear the in-memory (L1) searcher cache:

FLUSH INDEXTABLES SEARCHER CACHE;

This clears:

  • Split cache managers
  • Driver-side locality assignments
  • Native tantivy4java caches

FLUSH DATA SKIPPING STATS

Reset data skipping statistics (keeps cache entries):

FLUSH INDEXTABLES DATA SKIPPING STATS;

INVALIDATE Commands

INVALIDATE TRANSACTION LOG CACHE

Force refresh of transaction log cache for a specific table:

-- Invalidate cache for a specific table
INVALIDATE INDEXTABLES TRANSACTION LOG CACHE FOR 's3://bucket/my_index';

Output

ColumnDescription
table_pathPath that was invalidated
resultSuccess or error message
cache_hits_beforeCache hits before invalidation
cache_misses_beforeCache misses before invalidation
hit_rate_beforeCache hit rate before invalidation

Use this when you know the table has been modified externally and want to force a refresh.

INVALIDATE DATA SKIPPING CACHE

Clear data skipping caches (both entries and statistics):

INVALIDATE INDEXTABLES DATA SKIPPING CACHE;

This clears:

  • Filter expression cache entries
  • Partition filter cache entries
  • All statistics