Installation
Add IndexTables to your project.
Maven
<dependency>
<groupId>io.indextables</groupId>
<artifactId>indextables_spark</artifactId>
<version>0.5.3_spark_3.5.3</version>
<classifier>linux-x86_64-shaded</classifier>
</dependency>
SBT
libraryDependencies += "io.indextables" % "indextables_spark" % "0.5.3_spark_3.5.3" classifier "linux-x86_64-shaded"
Spark Shell
spark-shell --packages io.indextables:indextables_spark:0.5.3_spark_3.5.3:linux-x86_64-shaded
Databricks
- Download the shaded JAR from Maven Central:
https://repo1.maven.org/maven2/io/indextables/indextables_spark/0.5.3_spark_3.5.3/indextables_spark-0.5.3_spark_3.5.3-linux-x86_64-shaded.jar - Upload it to a Unity Catalog volume (e.g.,
/Volumes/my_catalog/my_schema/artifacts/) - Create an init script that copies the JAR to the Databricks jars directory:
#!/bin/sh
cp /Volumes/my_catalog/my_schema/artifacts/indextables_spark-0.5.3_spark_3.5.3-linux-x86_64-shaded.jar /databricks/jars
- Upload the init script to your volume and configure it in your cluster settings under Advanced Options > Init Scripts
Requirements
| Component | Version |
|---|---|
| Apache Spark | 3.5.3 |
| Java | 11 or later |
| Scala | 2.12 |
Register SQL Extensions
To use SQL commands like MERGE SPLITS and PREWARM CACHE, register the extensions:
spark.sql("SET spark.sql.extensions=io.indextables.spark.extensions.IndexTables4SparkExtensions")
Or in spark-defaults.conf:
spark.sql.extensions=io.indextables.spark.extensions.IndexTables4SparkExtensions
Next Steps
- Quickstart - Create your first index in 5 minutes
- First Production Index - Deploy to S3 or Azure