Field Types

IndexTables supports two primary field types for text data: string and text.

String Fields (Default)

String fields store exact values and support full filter pushdown.

// Default - no configuration needed
df.write.format("io.indextables.spark.core.IndexTables4SparkTableProvider")
  .save("path")

// Or explicitly
df.write.format("io.indextables.spark.core.IndexTables4SparkTableProvider")
  .option("spark.indextables.indexing.typemap.status", "string")
  .save("path")

Supported Operations

= (equals)
<> (not equals)
IN (set membership)
IS NULL / IS NOT NULL

Use Cases

Status codes, IDs, categories
Enum values
Exact matching requirements

Text Fields

Text fields are tokenized for full-text search using IndexQuery.

df.write.format("io.indextables.spark.core.IndexTables4SparkTableProvider")
  .option("spark.indextables.indexing.typemap.content", "text")
  .save("path")

Querying

import org.apache.spark.sql.indextables.IndexQueryExpression._

df.filter($"content" indexquery "machine learning")
df.filter($"content" indexquery "error AND database")
df.filter($"content" indexquery "\"exact phrase\"")

Index Record Options

Control what's stored in the inverted index:

Option	Description	Index Size
`basic`	Document IDs only	Smallest
`freq`	IDs + term frequency	Medium
`position`	IDs + frequency + positions (default)	Largest

// Per-field configuration
spark.conf.set("spark.indextables.indexing.indexrecordoption.logs", "basic")

Fast Fields

For numeric aggregations, configure fast fields:

df.write.format("io.indextables.spark.core.IndexTables4SparkTableProvider")
  .option("spark.indextables.indexing.fastfields", "score,timestamp,value")
  .save("path")

Fast fields enable:

Aggregate pushdown (COUNT, SUM, AVG, MIN, MAX)
Bucket aggregations (DateHistogram, Histogram, Range)
Efficient sorting

Supported Schema Types

Spark Type	Tantivy Type	Notes
String	Text/String	Configurable
Integer/Long	I64	-
Float/Double	F64	-
Boolean	Bool	-
Date	Date	-
Timestamp	DateTime	-
Binary	Bytes	-
Struct/Array/Map	JSON	Auto-detected

String Fields (Default)​

Supported Operations​

Use Cases​

Text Fields​

Querying​

Index Record Options​

Fast Fields​

Supported Schema Types​