Skip to main content

Index catalog, constraints, and searchable graph state

· 6 min read
The LoraDB team
Engineering

The latest LoraDB engine work lands a real schema catalog and wires it through the parser, store, optimizer, executor, snapshots, WAL, and benchmarks.

This is not a shift away from the schema-free graph model. Labels, relationship types, and properties still appear when you write them. The new DDL gives developers explicit ways to say: keep a secondary structure for this hot predicate, and reject writes that violate this invariant.

What landed

Index management is now part of the Cypher surface:

CREATE INDEX user_email FOR (u:User) ON (u.email);
CREATE TEXT INDEX user_name FOR (u:User) ON (u.name);
CREATE POINT INDEX venue_location FOR (v:Venue) ON (v.location);
CREATE VECTOR INDEX movie_embedding FOR (m:Movie) ON (m.embedding)
OPTIONS {indexConfig: {`vector.dimensions`: 384, `vector.similarity_function`: 'cosine'}};
CREATE FULLTEXT INDEX article_search FOR (n:Article|Note) ON EACH [n.title, n.body];
CREATE INDEX rel_since FOR ()-[r:FOLLOWS]-() ON (r.since);

SHOW INDEXES;
SHOW VECTOR INDEXES;
DROP INDEX user_email IF EXISTS;

The supported catalog kinds are RANGE, TEXT, POINT, LOOKUP, VECTOR, and FULLTEXT. RANGE is the default for CREATE INDEX; TEXT and POINT activate dedicated string and spatial candidate registries; LOOKUP records label/type token indexes in the catalog; VECTOR records a kNN configuration; FULLTEXT builds an inverted index over one or more properties.

IF NOT EXISTS is idempotent across both duplicate names and equivalent index schemas. Duplicate names surface as 22N71, equivalent schemas as 22N70, and dropping a missing index without IF EXISTS returns 42N51.

Schema constraints are now part of the same surface:

CREATE CONSTRAINT book_isbn FOR (b:Book) REQUIRE b.isbn IS UNIQUE;
CREATE CONSTRAINT author_name FOR (a:Author) REQUIRE a.name IS NOT NULL;
CREATE CONSTRAINT actor_fullname FOR (a:Actor)
REQUIRE (a.first, a.last) IS NODE KEY;
CREATE CONSTRAINT movie_title FOR (m:Movie)
REQUIRE m.title IS :: STRING | LIST<STRING NOT NULL>;

SHOW CONSTRAINTS;
DROP CONSTRAINT book_isbn IF EXISTS;

The supported constraint family covers node and relationship uniqueness, existence, node keys, relationship keys, and property type constraints, including fixed-dimension VECTOR property types. Creating a constraint validates existing data before committing the catalog change, and later writes are checked at mutation time.

Faster scan shapes

The optimizer now uses graph statistics and catalog state when it compiles a query. Eligible scan-and-filter patterns can lower to specialized operators:

  • NodeByPropertyScan and NodeByPropertyRangeScan
  • NodeByTextScan
  • NodeByPointScan
  • RelByPropertyRangeScan
  • RelByTextScan
  • RelByPointScan

That covers node and relationship predicates such as:

MATCH (u:User) WHERE u.age >= 18 AND u.age < 65 RETURN u;
MATCH (u:User) WHERE u.name STARTS WITH 'Al' RETURN u;
MATCH (v:Venue)
WHERE point.withinBBox(v.location, $southwest, $northeast)
RETURN v;
MATCH ()-[r:FOLLOWS]->() WHERE r.since > 2020 RETURN r;

The executor still refilters conservative candidate sets. TEXT indexes use a trigram candidate path, and POINT indexes use spatial buckets, so correctness does not depend on the index being exact.

Search procedures

Two catalog-backed procedure families are now exposed through the limited built-in procedure dispatcher.

Full-text indexes support node and relationship scopes, multiple labels or relationship types, and multiple indexed properties:

CREATE FULLTEXT INDEX article_search
FOR (n:Article|Note) ON EACH [n.title, n.body]
OPTIONS {`fulltext.analyzer`: 'standard'};

CALL db.index.fulltext.queryNodes('article_search', 'graph powerful')
YIELD node, score;

The standard analyzer lowercases text, splits on non-alphanumeric boundaries, uses AND semantics across query terms, and scores by summed term frequency. It returns rows in descending score order. It is deliberately small: analyzer choice is currently limited to standard and simple, both on the same synchronous maintenance path.

Vector indexes support node and relationship scopes with explicit dimensions and either cosine or euclidean similarity:

CREATE VECTOR INDEX movie_embedding FOR (m:Movie) ON (m.embedding)
OPTIONS {indexConfig: {`vector.dimensions`: 3, `vector.similarity_function`: 'cosine'}};

CALL db.index.vector.queryNodes('movie_embedding', 5, [1.0, 0.0, 0.0])
YIELD node, score;

This is not an ANN engine yet. VECTOR indexes are catalog entries that validate configuration and scope the query procedure; the current implementation still uses a flat scan over matching entities and returns the top k rows by score.

Durability and snapshots

Index and constraint DDL travel through the same write path as graph mutations. WAL payloads now encode CreateIndex, DropIndex, CreateConstraint, and DropConstraint events, and recovery replays them into the in-memory catalog before queries run.

Snapshots also carry both catalogs. The LORACOL1 envelope remains format version 2, and the snapshot body is now version 4: version 3 added the index-catalog trailer, and version 4 adds the constraint-catalog trailer. Readers still accept older body formats, loading older snapshots with empty index or constraint lists as needed.

The snapshot and WAL codecs now use a small store-owned binary codec for nested property values and catalog records. That keeps WAL and snapshots aligned on the same catalog wire shape, including VECTOR/FULLTEXT index definitions and constraint property-type records.

Developer-facing improvements

Planner APIs now accept GraphStats, and the plan cache fingerprints catalog and cardinality state so adding or dropping an index invalidates stale plans. That means a query explained before CREATE INDEX can recompile into an indexed scan immediately after the catalog changes.

Read-only materialization also gained an optional native parallel feature through Rayon. It is default-on for native builds and disabled for WASM-style consumers through default-features = false.

SHOW INDEXES now accepts type filters such as SHOW RANGE INDEXES, SHOW FULLTEXT INDEXES, and SHOW VECTOR INDEXES. SHOW INDEXES and SHOW CONSTRAINTS also accept a YIELD-anchored projection tail:

SHOW INDEXES
YIELD name, type, entityType
WHERE type = 'FULLTEXT'
RETURN name
ORDER BY name;

Benchmark coverage was reshaped around intent:

  • query_implementations mirrors integration-test feature areas;
  • index_acceleration compares indexed and unindexed RANGE/TEXT scenarios;
  • existing scale, realistic, wal, concurrent, and concurrency_guard suites remain workload-specific tools.

Breaking changes and migration notes

For Rust callers of lora-compiler, Compiler::compile now requires a &GraphStats argument:

let compiled = Compiler::compile(&resolved, &store.graph_stats());

Use GraphStats::default() when compiling outside a store-backed runtime.

Snapshot writers now emit the newer LORACOL1 envelope/body combination. Current readers accept the previous body version, but older binaries should not be expected to read snapshots written after the index and constraint catalog trailers landed.

There are no data-model migrations for normal users. Existing graphs remain schema-free; add indexes where predicates are hot enough to justify the extra memory, and add constraints only for invariants you want enforced on every matching write.

Notable fixes

  • Plan-cache invalidation now accounts for catalog/cardinality changes.
  • Index and constraint catalog DDL survives WAL crash recovery.
  • TEXT and POINT indexes track property updates and label/type membership changes.
  • FULLTEXT indexes backfill existing data and track property updates/removals.
  • Constraint-owned backing indexes cannot be dropped directly; use DROP CONSTRAINT so the catalogs stay in sync.
  • Indexed scans preserve already-bound node and relationship variables instead of rebinding them.
  • Regex predicates cache compiled patterns per thread, avoiding repeated compilation for row-by-row =~ filters.
  • Numeric conversion helpers now reject non-finite or out-of-range float-to-int conversions instead of silently casting.
  • Ruby binding calls release the GVL more defensively and surface engine panics as query errors.

Still open

This work does not add ANN structures for vector search, custom full-text analyzers, a full-text query language, general CALL/RETURN procedure pipelines, or sorted-index ORDER BY planning. Composite RANGE indexes are accepted and visible in SHOW INDEXES, but current optimizer rewrites target single-property predicates.