Aggregating Query Results

Aggregations collapse input rows into fewer rows. Non-aggregated columns in the same RETURN or WITH act as an implicit group-by — Cypher has no explicit GROUP BY keyword.

This page covers clause-level semantics. For each function's exact behaviour (empty input, null handling, return type) see Aggregation Functions.

Overview

Goal	Clause
Count rows	`count(*)`
Count non-nulls	`count(expr)`
Sum a property	`sum(expr)`
Mean	`avg(expr)`
Extremes	`min(expr)`, `max(expr)`
Collect into a list	`collect(expr)`
Distinct values	`collect(DISTINCT expr)`
Standard deviation	`stdev(expr)`, `stdevp(expr)`
Percentiles	`percentileCont(expr, p)`, `percentileDisc(expr, p)`
Group by	Any non-aggregated column in the same `RETURN`
HAVING-style filter	`WITH … WHERE`

A five-step walkthrough

Rather than list functions first, let's build up an aggregation query in stages. Assume a graph seeded with:

UNWIND [
{region: 'EU',   amount: 50,  status: 'paid'},
{region: 'EU',   amount: 75,  status: 'paid'},
{region: 'EU',   amount: 20,  status: 'cancelled'},
{region: 'US',   amount: 200, status: 'paid'},
{region: 'US',   amount: 120, status: 'paid'},
{region: 'APAC', amount: 90,  status: 'paid'}
] AS row
CREATE (:Order {region: row.region, amount: row.amount, status: row.status})

1. Count

Start with the simplest aggregate. One row, one column.

MATCH (o:Order)
RETURN count(*) AS orders
// orders: 6

2. Group by region

Add a non-aggregated column to group on. Now one row per region.

MATCH (o:Order)
RETURN o.region AS region, count(*) AS orders
// EU: 3, US: 2, APAC: 1

3. Add a second aggregate

Every aggregate operates on the same group. Mix as many as you like in one RETURN.

MATCH (o:Order)
RETURN o.region       AS region,
     count(*)       AS orders,
     sum(o.amount)  AS revenue,
     avg(o.amount)  AS avg_ticket
ORDER BY revenue DESC

4. Filter before aggregating

Filter with WHERE before the RETURN — this narrows the input rows that reach the aggregate.

MATCH (o:Order)
WHERE o.status = 'paid'
RETURN o.region AS region, sum(o.amount) AS paid_revenue
ORDER BY paid_revenue DESC

5. Filter after aggregating (HAVING-style)

Cypher has no HAVING. Aggregate into a WITH, then filter:

MATCH (o:Order)
WHERE o.status = 'paid'
WITH o.region AS region, sum(o.amount) AS paid_revenue
WHERE paid_revenue > 100
RETURN region, paid_revenue

Why WITH not WHERE directly? Aggregates are not permitted inside WHERE. WITH promotes aggregated values into projected columns so the next WHERE can filter on them.

You now have the shape of most real aggregation queries: group, aggregate, optionally filter. The rest of the page is a reference for every aggregate and its edge cases.

Built-in aggregates

Function	Supports `DISTINCT`	Empty input	Null input	Return type
`count(*)`	—	`0`	counted as 1 per row	`Int`
`count(expr)`	yes	`0`	skipped	`Int`
`collect(expr)`	yes	`[]`	included as `null`	`List`
`sum(expr)`	yes	`null`	skipped	`Int` if all-int, else `Float`
`avg(expr)`	yes	`null`	skipped	`Float`
`min(expr)`	yes	`null`	skipped	type of min element
`max(expr)`	yes	`null`	skipped	type of max element
`stdev(expr)`	—	`0.0`	skipped	`Float`, sample (n − 1)
`stdevp(expr)`	—	`0.0`	skipped	`Float`, population (n)
`percentileCont(expr, p)`	—	`null`	skipped	`Float`, linear interpolation
`percentileDisc(expr, p)`	—	`null`	skipped	`Float`, nearest rank

count

Row count

MATCH (n:User)
RETURN count(*) AS users

count(*) counts every input row, including rows where bound variables are null.

Non-null count

MATCH (n:User)
RETURN count(n.email) AS users_with_email

count(expr) skips rows where expr is null.

Distinct

MATCH (p:Person)
RETURN count(DISTINCT p.city) AS distinct_cities

count(*) vs count(expr) after OPTIONAL MATCH

This distinction matters when a row survives the left-join but has a null binding:

MATCH (u:User)
OPTIONAL MATCH (u)-[:WROTE]->(p:Post)
RETURN u.name,
     count(*)        AS rows,         // 1 even for users without posts
     count(p)        AS posts,        // 0 for users without posts
     count(DISTINCT p) AS unique_posts // same as count(p) when p is 1-1 per row

Use count(expr) when you want zeros for empty optional matches; see OPTIONAL MATCH with aggregation.

Zero input

MATCH (:LabelThatDoesNotExist)
RETURN count(*)   // 0

An empty match still produces one row from count(*) with the value 0.

collect

collect turns rows into a list — useful for attaching one-to-many results to each group.

MATCH (p:Person)-[:KNOWS]->(f:Person)
RETURN p.name AS person, collect(f.name) AS friends

Distinct values

MATCH (p:Person)-[:VISITED]->(c:City)
RETURN p.name, collect(DISTINCT c.name) AS unique_cities

Empty input

MATCH (:Never)
RETURN collect(1)   // []

Collect keeps nulls

Unlike count(expr) / sum / avg, collect keeps null values that make it through the pipeline. Filter first if you don't want them:

UNWIND [1, null, 2, null, 3] AS x
RETURN collect(x);           // [1, null, 2, null, 3]
RETURN collect(DISTINCT x)  // [1, null, 2, 3]  (distinct still includes null)

;// To drop nulls before collecting:
UNWIND [1, null, 2, null, 3] AS x
WITH x WHERE x IS NOT NULL
RETURN collect(x)           // [1, 2, 3]

sum, avg

MATCH (o:Order)
RETURN sum(o.amount) AS revenue,
     avg(o.amount) AS avg_ticket

sum returns an Int when every input is an Int; otherwise Float.
avg is always Float.
Both return null on empty input (nothing to average / sum).
Nulls in input are skipped — they don't affect the result.

UNWIND [1, 2, null, 4] AS x
RETURN sum(x), avg(x)   // 7, 2.333…

min, max

MATCH (e:Event)
RETURN min(e.when) AS earliest,
     max(e.when) AS latest

Works on numbers, strings, and temporal values under each type's native total order.
Nulls skipped.
Returns null on empty input.

UNWIND ['banana', 'apple', 'cherry'] AS s
RETURN min(s), max(s)   // 'apple', 'cherry'

Grouping

Any non-aggregated column in the same RETURN or WITH becomes part of the implicit group key.

MATCH (o:Order)
RETURN o.region AS region,
     count(*)      AS orders,
     sum(o.amount) AS revenue
ORDER BY revenue DESC

Equivalent in SQL:

SELECT region, COUNT(*), SUM(amount)
FROM Order
GROUP BY region
ORDER BY SUM(amount) DESC;

Two-level grouping

Every non-aggregated column participates in the key:

MATCH (o:Order)
RETURN o.region AS region,
     o.status AS status,
     count(*) AS orders
ORDER BY region, status

Grouping by a computed expression

The group key can be any expression, not just a property access:

MATCH (p:Person)
RETURN p.born / 100 * 100 AS century, count(*) AS n
ORDER BY century

HAVING-style filtering

Cypher has no HAVING. Aggregate in a WITH, then filter:

MATCH (p:Person)-[:WORKS_AT]->(c:Company)
WITH c.name AS company, count(p) AS employees
WHERE employees > 5
RETURN company, employees

Aggregates are not permitted in WHERE. Attempting WHERE count(x) > 5 is a semantic error — always pipe through WITH.

Aggregation after filtering

Filter before aggregating:

MATCH (o:Order)
WHERE o.amount > 100 AND o.status = 'paid'
RETURN count(*) AS big_paid_orders

Duplicates vs DISTINCT

The gap between count(x) and count(DISTINCT x) reveals repeats.

MATCH (p:Person)-[:BOUGHT]->(b:Book)
RETURN p.name,
     count(b)          AS total_purchases,
     count(DISTINCT b) AS distinct_books

Multiple aggregates

Any number of aggregates may appear in one RETURN:

MATCH (r:Review)
RETURN count(*)                       AS n,
     avg(r.stars)                   AS mean,
     stdev(r.stars)                 AS sd,
     percentileCont(r.stars, 0.5)   AS median,
     percentileCont(r.stars, 0.95)  AS p95,
     min(r.stars)                   AS worst,
     max(r.stars)                   AS best

percentileCont vs percentileDisc

percentileCont(expr, p) — linear interpolation between ordered values. Returns a value that may not appear in the input.
percentileDisc(expr, p) — nearest-rank. Returns an input value.

UNWIND [1, 2, 3, 4, 5] AS x
RETURN percentileCont(x, 0.5),  // 3.0
     percentileDisc(x, 0.5);   // 3

UNWIND [1, 2, 3, 4] AS x
RETURN percentileCont(x, 0.5),  // 2.5
     percentileDisc(x, 0.5)   // 2

Both accept p ∈ [0, 1].

Pipeline aggregation

You can aggregate at one stage and then re-aggregate downstream:

MATCH (u:User)-[:PLACED]->(o:Order)
WITH u.region AS region, o.amount AS amount
WITH region, sum(amount) AS revenue
WITH avg(revenue) AS avg_region_revenue,
   collect({region: region, revenue: revenue}) AS rows
RETURN avg_region_revenue, rows

Common patterns

Top-N per group

MATCH (u:User)-[:WROTE]->(p:Post)
WITH u, count(p) AS posts
ORDER BY posts DESC
LIMIT 10
RETURN u.name, posts

MATCH (u:User)-[:WROTE]->(p:Post)
WITH u, p ORDER BY p.published_at DESC
WITH u, collect(p.title)[..3] AS last_three
RETURN u.name, last_three

Rolling window — bucket by day

MATCH (e:Event)
RETURN temporal.truncate('month', e.at) AS month, count(*) AS events
ORDER BY month

Uses temporal.truncate.

Average per category, HAVING filter

MATCH (p:Product)
WITH p.category AS category, avg(p.rating) AS rating
WHERE rating >= 4.0
RETURN category, rating
ORDER BY rating DESC

Dedup with DISTINCT inside collect

MATCH (a:Author)-[:WROTE]->(:Book)-[:IN]->(g:Genre)
RETURN a.name, collect(DISTINCT g.name) AS genres

Conditional count (count-if)

count(expr) skips null. Combined with CASE whose ELSE branch is implicitly null, this gives you a clean "count rows matching X":

MATCH (o:Order)
RETURN count(*)                                              AS total,
     count(CASE WHEN o.status = 'paid'      THEN 1 END)   AS paid,
     count(CASE WHEN o.status = 'cancelled' THEN 1 END)   AS cancelled,
     count(CASE WHEN o.amount >= 1000       THEN 1 END)   AS large

MATCH (o:Order)
WITH o.region AS region, sum(o.amount) AS revenue
WITH collect({region: region, revenue: revenue}) AS rows,
   sum(revenue)                                 AS total
UNWIND rows AS r
RETURN r.region                              AS region,
     r.revenue                             AS revenue,
     toFloat(r.revenue) / total            AS share
ORDER BY share DESC

Two-pass aggregate: first compute each region's total, then divide by the grand total from the same pipeline.

Running metrics over time buckets

MATCH (e:Event)
WHERE e.at >= temporal.now() - 'P1Y'::DURATION
RETURN temporal.truncate('month', e.at)  AS month,
     count(*)                       AS events,
     count(DISTINCT e.user_id)      AS unique_users,
     avg(e.duration_ms) / 1000.0    AS avg_seconds
ORDER BY month

Top contributor per group (pipeline trick)

LoraDB has no window functions. Express "top-N per group" as aggregate-then-filter:

MATCH (u:User)-[:WROTE]->(p:Post)
WITH u, p ORDER BY p.views DESC
WITH u, collect(p)[..1] AS top
UNWIND top AS best_post
RETURN u.handle, best_post.title, best_post.views

One row per user with their highest-viewed post. The collect(…)[..1] slice picks the first element of the sort-ordered collect.

Limitations

No GROUP BY keyword. Grouping is always implicit on non-aggregated columns.
No HAVING keyword. Filter post-aggregate through WITH … WHERE.
Aggregates rejected in WHERE. Analysis-time error.
stdev, stdevp, percentileCont, percentileDisc do not support DISTINCT. Use collect(DISTINCT x) + UNWIND + aggregate if you need percentile of distinct values.

See Limitations for the full list.

Overview​

A five-step walkthrough​

1. Count​

2. Group by region​

3. Add a second aggregate​

4. Filter before aggregating​

5. Filter after aggregating (HAVING-style)​

Built-in aggregates​

count​

Row count​

Non-null count​

Distinct​

count(*) vs count(expr) after OPTIONAL MATCH​

Zero input​

collect​

Distinct values​

Empty input​

Collect keeps nulls​

sum, avg​

min, max​

Grouping​

Two-level grouping​

Grouping by a computed expression​

HAVING-style filtering​

Aggregation after filtering​

Duplicates vs DISTINCT​

Multiple aggregates​

percentileCont vs percentileDisc​

Pipeline aggregation​

Common patterns​

Top-N per group​

Collect top-N related​

Rolling window — bucket by day​

Average per category, HAVING filter​

Dedup with DISTINCT inside collect​

Conditional count (count-if)​

Percent share per group​

Running metrics over time buckets​

Top contributor per group (pipeline trick)​

Limitations​

See also​