Aggregation Functions

Aggregation collapses a group of input rows into a single value per group. For clause-level semantics (implicit GROUP BY, HAVING-style filtering via WITH, where aggregates are legal) see the Aggregation query page.

All aggregates skip null inputs except count(*) (counts rows) and collect(expr) (keeps nulls). Empty-input semantics vary per function — see the summary table.

Summary table

Function	`DISTINCT`	Empty input	Null input	Returns
`count(*)`	—	`0`	counted as 1 per row	`Int`
`count(expr)`	yes	`0`	skipped	`Int`
`collect(expr)`	yes	`[]`	included as `null`	`List`
`sum(expr)`	yes	`null`	skipped	`Int` if all-int, else `Float`
`avg(expr)`	yes	`null`	skipped	`Float`
`min(expr)` / `max(expr)`	yes	`null`	skipped	same type as element
`stdev(expr)`	—	`0.0`	skipped	`Float`, sample (n − 1)
`stdevp(expr)`	—	`0.0`	skipped	`Float`, population (n)
`percentileCont(expr, p)`	—	`null`	skipped	`Float`, linear interpolation
`percentileDisc(expr, p)`	—	`null`	skipped	`Float`, nearest rank

count

Row count

MATCH (n:User)
RETURN count(*) AS users

Non-null count

UNWIND [1, 2, null, 4] AS x
RETURN count(*), count(x)
-- 4, 3

count(*) counts every input row, including rows where bound variables are null; count(expr) skips null expr.

Distinct count

UNWIND ['a', 'a', 'b', 'c'] AS x
RETURN count(x), count(DISTINCT x)
-- 4, 3

count with OPTIONAL MATCH

This is the subtlety that trips up new Cypher users:

MATCH (u:User)
OPTIONAL MATCH (u)-[:WROTE]->(p:Post)
RETURN u.name,
       count(*) AS rows,   -- 1 per user, even if no posts
       count(p) AS posts   -- 0 if no posts

Always prefer count(expr) when you want zeros for optional matches.

Empty graph

MATCH (:NoSuchLabel)
RETURN count(*)       -- 0

One row out, with value 0 — count(*) never returns null.

collect

Basic collect

MATCH (p:Person)-[:KNOWS]->(f:Person)
RETURN p.name, collect(f.name) AS friends

Distinct values

UNWIND [1, 2, null, 2, 3] AS x
RETURN collect(x),           -- [1, 2, null, 2, 3]
       collect(DISTINCT x)   -- [1, 2, null, 3]

Collect keeps nulls

collect keeps nulls that survive to the aggregate. Filter before the aggregate if you don't want them:

UNWIND [1, 2, null, 3] AS x
WITH x WHERE x IS NOT NULL
RETURN collect(x)             -- [1, 2, 3]

Collect + slice for top-N

MATCH (u:User)-[:WROTE]->(p:Post)
WITH u, p ORDER BY p.published_at DESC
WITH u, collect(p.title)[..5] AS last_five
RETURN u.name, last_five

Use any list operation on the resulting list.

Collect of maps

MATCH (p:Project)-[:HAS_TASK]->(t:Task)
RETURN p.name,
       collect({id: t.id, name: t.name, done: t.done}) AS tasks

One row per project, with an array of task summaries.

sum

UNWIND [1, 2, null, 4] AS x
RETURN sum(x)                 -- 7

UNWIND [1.0, 2.5, 3.5] AS x
RETURN sum(x)                 -- 7.0

MATCH (:Never)
RETURN sum(1)                 -- null   (empty input)

Return type: Int when every contributing element is an Int; Float if any contributor is a Float.

Distinct sum

UNWIND [1, 1, 2, 2, 3] AS x
RETURN sum(x), sum(DISTINCT x)   -- 9, 6

avg

UNWIND [1, 2, 3, 4] AS x
RETURN avg(x)                 -- 2.5

UNWIND [1, null, 3] AS x
RETURN avg(x)                 -- 2.0

MATCH (:Never)
RETURN avg(1)                 -- null

Always returns Float (or null on empty input).

min / max

Works on numbers, strings, and temporal values under their natural total order.

UNWIND ['banana', 'apple', 'cherry'] AS s
RETURN min(s), max(s)         -- 'apple', 'cherry'

UNWIND [date('2024-01-01'), date('2024-06-30'), date('2024-12-15')] AS d
RETURN min(d), max(d)
-- 2024-01-01, 2024-12-15

MATCH (:Never)
RETURN min(1)                 -- null

Min/max with tiebreaker

min() / max() only return the value, not the node owning it. For "the node with the maximum", sort and take one:

MATCH (u:User)-[:WROTE]->(p:Post)
WITH u, count(p) AS posts
ORDER BY posts DESC
LIMIT 1
RETURN u.name, posts

stdev / stdevp

Sample vs population standard deviation.

UNWIND [2, 4, 4, 4, 5, 5, 7, 9] AS x
RETURN stdev(x),              -- 2.1380…  (n − 1)
       stdevp(x)              -- 2.0      (n)

stdev returns 0.0 when fewer than two non-null values are aggregated.
stdevp returns 0.0 on an empty input.

MATCH (r:Review)
WITH r.product AS product, avg(r.stars) AS mean, stdev(r.stars) AS sd
WHERE sd > 1.0
RETURN product, mean, sd

percentileCont / percentileDisc

Both take the column and a percentile p ∈ [0, 1].

UNWIND [1, 2, 3, 4, 5] AS x
RETURN percentileCont(x, 0.5),  -- 3.0    (exact median for odd count)
       percentileDisc(x, 0.5)   -- 3

UNWIND [1, 2, 3, 4] AS x
RETURN percentileCont(x, 0.5),  -- 2.5    (linear interpolation)
       percentileDisc(x, 0.5)   -- 2      (nearest rank)

UNWIND [10, 20, 30, 40, 50, 60, 70, 80, 90, 100] AS x
RETURN percentileCont(x, 0.9),  -- 91.0
       percentileDisc(x, 0.9)   -- 90

percentileCont interpolates between values.
percentileDisc picks an actual input value.

Use percentileCont when the "true" percentile matters (latency histograms); percentileDisc when you need a real observation (the actual data point at P50).

Typical patterns

Group + aggregate

MATCH (o:Order)-[:CONTAINS]->(i:Item)
RETURN o.region AS region,
       count(i)     AS items,
       sum(i.price) AS revenue
ORDER BY revenue DESC

Aggregate + filter (HAVING)

MATCH (p:Person)-[:WORKS_AT]->(c:Company)
WITH c.name AS company, count(p) AS employees
WHERE employees > 5
RETURN company, employees

Multiple aggregates in one RETURN

MATCH (r:Review)
RETURN count(*) AS n,
       avg(r.stars) AS mean,
       stdev(r.stars) AS sd,
       percentileCont(r.stars, 0.5)  AS median,
       percentileCont(r.stars, 0.95) AS p95

Re-aggregate after first aggregate

MATCH (o:Order)
WITH o.region AS region, sum(o.amount) AS revenue
RETURN count(region) AS regions,
       avg(revenue)  AS mean_regional_revenue

Rolling count by date bucket

MATCH (e:Event)
RETURN date.truncate('month', e.at) AS month,
       count(*) AS events
ORDER BY month

Uses date.truncate.

Percentile per group

MATCH (r:Review)
RETURN r.product AS product,
       percentileCont(r.stars, 0.5)  AS p50,
       percentileCont(r.stars, 0.95) AS p95
ORDER BY p95 DESC

Count-if via CASE

count(expr) skips null. A CASE with an omitted ELSE returns null, so the pattern cleanly expresses "count rows where condition holds":

MATCH (o:Order)
RETURN o.region,
       count(CASE WHEN o.status = 'paid'      THEN 1 END) AS paid,
       count(CASE WHEN o.status = 'cancelled' THEN 1 END) AS cancelled

See CASE.

Sum with a filter on the aggregated value

WHERE runs before aggregation. To filter aggregated values, pipe through WITH:

MATCH (o:Order)
WITH o.customer AS customer, sum(o.amount) AS lifetime
WHERE lifetime > 1000
RETURN customer, lifetime
ORDER BY lifetime DESC

Aggregation with ORDER BY inside collect

collect preserves input order. Sort the rows before the aggregate to produce an ordered list:

MATCH (u:User)-[:WROTE]->(p:Post)
WITH u, p ORDER BY p.published_at DESC
RETURN u.handle, collect(p.title)[..3] AS latest_three

Limitations

Aggregates are rejected inside WHERE. Use WITH … WHERE instead.
stdev, stdevp, percentileCont, percentileDisc don't support DISTINCT. For a percentile of distinct values, first collect(DISTINCT x), then UNWIND and aggregate.
count(*) counts rows, not non-null values. Use count(expr) when you want nulls skipped.
No GROUP BY keyword — non-aggregated columns in the same projection stage form the group key implicitly.

See Limitations for the full list.

Summary table​

count​

Row count​

Non-null count​

Distinct count​

count with OPTIONAL MATCH​

Empty graph​

collect​

Basic collect​

Distinct values​

Collect keeps nulls​

Collect + slice for top-N​

Collect of maps​

sum​

Distinct sum​

avg​

min / max​

Min/max with tiebreaker​

stdev / stdevp​

percentileCont / percentileDisc​

Typical patterns​

Group + aggregate​

Aggregate + filter (HAVING)​

Multiple aggregates in one RETURN​

Re-aggregate after first aggregate​

Rolling count by date bucket​

Percentile per group​

Count-if via CASE​

Sum with a filter on the aggregated value​

Aggregation with ORDER BY inside collect​

Limitations​

See also​