Installation
Add IndexTables to your project.
Maven
<dependency>
<groupId>io.indextables</groupId>
<artifactId>indextables4spark_2.12</artifactId>
<version>0.4.0_3.5.3</version>
</dependency>
SBT
libraryDependencies += "io.indextables" %% "indextables4spark" % "0.4.0_3.5.3"
Spark Shell
spark-shell --packages io.indextables:indextables4spark_2.12:0.4.0_3.5.3
Databricks
- Download the shaded JAR from the releases page
- Upload it to a Unity Catalog volume (e.g.,
/Volumes/my_catalog/my_schema/artifacts/) - Create an init script that copies the JAR to the Databricks jars directory:
#!/bin/sh
cp /Volumes/my_catalog/my_schema/artifacts/indextables_spark-0.4.0-linux-x86_64-shaded.jar /databricks/jars
- Upload the init script to your volume and configure it in your cluster settings under Advanced Options > Init Scripts
Requirements
| Component | Version |
|---|---|
| Apache Spark | 3.5.3 |
| Java | 11 or later |
| Scala | 2.12 |
Register SQL Extensions
To use SQL commands like MERGE SPLITS and PREWARM CACHE, register the extensions:
spark.sql("SET spark.sql.extensions=io.indextables.spark.extensions.IndexTables4SparkExtensions")
Or in spark-defaults.conf:
spark.sql.extensions=io.indextables.spark.extensions.IndexTables4SparkExtensions
Next Steps
- Quickstart - Create your first index in 5 minutes
- First Production Index - Deploy to S3 or Azure