Filter Pushdown

IndexTables pushes filters down to the search engine for efficient execution.

Equality Filters

// Exact equality
df.filter($"status" === "active")

// Null-safe equality
df.filter($"status" <=> "active")

// NOT equal
df.filter($"status" =!= "deleted")
// Or using NOT
df.filter(!($"status" === "deleted"))

Range Filters

All numeric, date, and timestamp fields support range queries:

// Greater than / less than
df.filter($"score" > 0.5)
df.filter($"count" < 100)

// Greater than or equal / less than or equal
df.filter($"count" >= 10)
df.filter($"count" <= 1000)

// Combined ranges
df.filter($"count" >= 10 && $"count" < 100)

// Date/timestamp ranges
df.filter($"timestamp" >= "2024-01-01")
df.filter($"event_date" < "2024-12-31")

IN Clause

df.filter($"region".isin("us-east", "us-west", "eu-west"))
df.filter($"status".isin("pending", "active", "completed"))

NULL Checks

// IS NOT NULL - fully pushed down
df.filter($"category".isNotNull)

// IS NULL - evaluated by Spark (not pushed down)
df.filter($"category".isNull)

note

IS NULL filters are not pushed down because the search engine doesn't index null values. Spark will post-filter these results.

Compound Filters

Combine filters with AND, OR, and NOT:

// AND
df.filter($"status" === "active" && $"region" === "us-east")

// OR
df.filter($"priority" === "high" || $"priority" === "critical")

// NOT
df.filter(!($"status" === "deleted"))

// Complex combinations
df.filter(($"status" === "error" || $"status" === "warning") && $"region" === "us-east")

String Pattern Filters

String pattern filters are supported but disabled by default for performance reasons. Enable via configuration:

// Enable all string pattern pushdowns
spark.conf.set("spark.indextables.filter.stringPattern.pushdown", "true")

// Or enable individually
spark.conf.set("spark.indextables.filter.stringStartsWith.pushdown", "true")  // Most efficient
spark.conf.set("spark.indextables.filter.stringEndsWith.pushdown", "true")
spark.conf.set("spark.indextables.filter.stringContains.pushdown", "true")    // Least efficient

Once enabled:

// StartsWith (most efficient - uses prefix queries)
df.filter($"message".startsWith("ERROR"))

// EndsWith
df.filter($"filename".endsWith(".log"))

// Contains
df.filter($"content".contains("exception"))

Full-Text Search with IndexQuery

For text fields, use the indexquery operator for full-text search:

import org.apache.spark.sql.indextables.IndexQueryExpression._

// Single term
df.filter($"content" indexquery "error")

// Boolean queries
df.filter($"content" indexquery "error AND database")
df.filter($"content" indexquery "error OR warning")
df.filter($"content" indexquery "error NOT timeout")

// Phrase search
df.filter($"content" indexquery "\"connection refused\"")

See IndexQuery Syntax for full details.

Partition Filters

Partition column filters enable partition pruning:

// Only reads matching partitions
df.filter($"date" === "2024-01-15")
  .filter($"message" indexquery "error")

// Range on partition columns
df.filter($"date" >= "2024-01-01" && $"date" < "2024-02-01")

Nested JSON Field Filters

Nested fields in Struct, Array, and Map types support filter pushdown:

// Struct fields
df.filter($"user.name" === "Alice")
df.filter($"user.age" > 28)

// Deeply nested
df.filter($"request.headers.contentType" === "application/json")

// Range on nested fields
df.filter($"metadata.score" >= 0.5)

What Gets Pushed Down

Filter Type	Pushed Down	Notes
`=` (EqualTo)	Yes	String fields only (text fields use IndexQuery)
`<=>` (EqualNullSafe)	Yes	String fields only
`>` (GreaterThan)	Yes	All fields including nested JSON
`>=` (GreaterThanOrEqual)	Yes	All fields including nested JSON
`<` (LessThan)	Yes	All fields including nested JSON
`<=` (LessThanOrEqual)	Yes	All fields including nested JSON
`IN`	Yes	Full pushdown
`IS NOT NULL`	Yes	Regular fields only (not nested JSON)
`IS NULL`	No	Search engine doesn't index nulls
`AND`	Yes	If both children are supported
`OR`	Yes	If both children are supported
`NOT`	Yes	If child is supported
`LIKE 'prefix%'`	Config	Enable with `stringStartsWith.pushdown`
`LIKE '%suffix'`	Config	Enable with `stringEndsWith.pushdown`
`LIKE '%sub%'`	Config	Enable with `stringContains.pushdown`
`indexquery`	Yes	Text fields - full-text search
UDF	No	Evaluated by Spark

Equality Filters​

Range Filters​

IN Clause​

NULL Checks​

Compound Filters​

String Pattern Filters​

Full-Text Search with IndexQuery​

Partition Filters​

Nested JSON Field Filters​

What Gets Pushed Down​