Azure Configuration

Configure IndexTables for Azure Blob Storage.

Authentication

Account Key

spark.conf.set("spark.indextables.azure.accountName", "mystorageaccount")
spark.conf.set("spark.indextables.azure.accountKey", "YOUR_ACCOUNT_KEY")

OAuth Service Principal

spark.conf.set("spark.indextables.azure.accountName", "mystorageaccount")
spark.conf.set("spark.indextables.azure.tenantId", "YOUR_TENANT_ID")
spark.conf.set("spark.indextables.azure.clientId", "YOUR_CLIENT_ID")
spark.conf.set("spark.indextables.azure.clientSecret", "YOUR_CLIENT_SECRET")

Supported URL Formats

Format	Example
ABFSS	`abfss://container@account.dfs.core.windows.net/path`
WASBS	`wasbs://container@account.blob.core.windows.net/path`
ABFS	`abfs://container@account.dfs.core.windows.net/path`

Usage

// Write to Azure Blob
df.write
  .format("io.indextables.spark.core.IndexTables4SparkTableProvider")
  .save("abfss://container@account.dfs.core.windows.net/logs")

// Read from Azure Blob
val df = spark.read
  .format("io.indextables.spark.core.IndexTables4SparkTableProvider")
  .load("abfss://container@account.dfs.core.windows.net/logs")

Environment Variables

Azure credentials can also come from environment variables or ~/.azure/credentials.

Best Practices

Use Service Principal for production workloads
Enable disk cache on Azure HDInsight VMs
Use ABFSS for best performance (hierarchical namespace)

Authentication​

Account Key​

OAuth Service Principal​

Supported URL Formats​

Usage​

Environment Variables​

Best Practices​

Authentication

Account Key

OAuth Service Principal

Supported URL Formats

Usage

Environment Variables

Best Practices