Databricks Deployment

IndexTables is optimized for Databricks with automatic detection of local NVMe storage.

Installation

Download the shaded JAR from the releases page
Upload it to a Unity Catalog volume (e.g., /Volumes/my_catalog/my_schema/artifacts/)
Create an init script that copies the JAR to the Databricks jars directory:

#!/bin/sh
cp /Volumes/my_catalog/my_schema/artifacts/indextables_spark-0.4.0-linux-x86_64-shaded.jar /databricks/jars

Upload the init script to your volume and configure it in your cluster settings under Advanced Options > Init Scripts

Requirements

Component	Version
Apache Spark	3.5.3
Java	11 or later
Scala	2.12

Register SQL Extensions

SET spark.sql.extensions=io.indextables.spark.extensions.IndexTables4SparkExtensions

Auto-Detected Settings

When /local_disk0 is detected, these settings are automatically configured:

Temp directory: /local_disk0/temp
Cache directory: /local_disk0/cache
Disk cache: Enabled

Unity Catalog Integration

If using Unity Catalog External Locations to access S3 data, configure the Unity Catalog credential provider. We recommend setting these as cluster Spark properties:

spark.indextables.databricks.workspaceUrl https://<workspace>.cloud.databricks.com
spark.indextables.databricks.apiToken <your-token>
spark.indextables.aws.credentialsProviderClass io.indextables.spark.auth.unity.UnityCatalogAWSCredentialProvider

Alternatively, configure in your notebook:

spark.conf.set("spark.indextables.databricks.workspaceUrl", "https://<workspace>.cloud.databricks.com")
spark.conf.set("spark.indextables.databricks.apiToken", dbutils.secrets.get("scope", "token"))

# Or use your notebook's token directly
spark.conf.set("spark.indextables.databricks.apiToken",
  dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get())

spark.conf.set("spark.indextables.aws.credentialsProviderClass",
  "io.indextables.spark.auth.unity.UnityCatalogAWSCredentialProvider")

External Location Requirements

Your S3 path must be configured as a Unity Catalog External Location. The following are required:

The metastore must have external_access_enabled set to true
You must have the EXTERNAL_USE_LOCATION privilege on the external location

See the generateTemporaryPathCredentials API for details.

Credentials are resolved on the driver and broadcast to executors — no network calls from executors to Databricks.

Recommended Cluster Configuration

For all clusters, set the following to ensure caching and prewarming works properly:

spark.locality.wait 30s

Query Clusters

For query workloads, use instances with high memory and NVMe storage:

Instance Type	vCPUs	Memory	Storage
r6id.2xlarge	8	64 GB	NVMe
i4i.2xlarge	8	64 GB	NVMe

spark.executor.memory 27016m

Indexing Clusters

For write/indexing workloads, compute-optimized instances work well:

Instance Type	vCPUs	Memory	Storage
c6id.2xlarge	8	32 GB	NVMe

spark.executor.memory 16348m

Performance Settings

# Recommended for Databricks
spark.conf.set("spark.indextables.indexWriter.heapSize", "200M")
spark.conf.set("spark.indextables.s3.maxConcurrency", "8")

Photon Compatibility

IndexTables works with Photon-enabled clusters. Aggregations and filters are pushed down before Photon processes results.

Installation​

Requirements​

Register SQL Extensions​

Auto-Detected Settings​

Unity Catalog Integration​

Recommended Cluster Configuration​

Query Clusters​

Indexing Clusters​

Performance Settings​

Photon Compatibility​