Learn what causes one of the most common PostgreSQL GROUP BY errors and how you can fix it!
As a PostgreSQL user, you surely encounter many error messages when writing an SQL code. Sometimes they are quite unclear, but you won’t need an interpreter for this one: “must appear in the GROUP BY clause or be used in an aggregate function”.
This error message mentions GROUP BY
and aggregate functions. These are fundamental PostgreSQL concepts you need to master to fix this error. Of course, fixing it also means not making this error too often when you move to advanced concepts, such as creating reports; using window functions, CTEs, and recursion; writing your own functions, or working with maps. All this is covered in our SQL from A to Z in PostgreSQL learning path. This comprehensive interactive course will make it easier for you to avoid PostgreSQL error messages. And when they appear, you’ll be able to solve them quickly.
As you’ll soon see, solving the error we mentioned in the article’s title is also relatively easy. It does, however, require an understanding of how GROUP BY in PostgreSQL works.
What Causes the “must appear in the GROUP BY clause or be used in an aggregate function” Error?
As always, we’ll be very practical and use SQL code to show you what causes the error and how to fix it.
Dataset
The dataset we’ll use consists of two tables. The first is freelancers
:
id
– The freelancer’s ID and the table’s primary key (PK).first_name
– The freelancer’s first name.last_name
– The freelancer’s last name.email
– The freelancer’s email.country
– The freelancer’s country.
id | first_name | last_name | country | |
---|---|---|---|---|
1 | Pete | Thompson | pthompson@gmail.com | UK |
2 | Nadine | Lopez | nlopez@gmail.com | Italy |
3 | Claudio | Stratos | cstratos@gmail.com | Italy |
4 | Miriam | Valetti | mvaletti@gmail.com | Italy |
The CREATE TABLE
query is available here.
The second table is named weekly_salary
and contains data on how much freelancers’ are paid each week. The columns are:
id
– The salary’s ID and the table’s primary key (PK).freelancers_id
– The freelancer’s ID and the table’s foreign key (FK) from the table freelancers.week_start
– The starting date for the salary calculation.week_end
– The ending date for the salary calculation.paid_amount
– The salary amount.
id | freelancers_id | week_start | week_end | paid_amount |
---|---|---|---|---|
1 | 1 | 2023-01-02 | 2023-01-08 | 623.56 |
2 | 1 | 2023-01-09 | 2023-01-15 | 987.41 |
3 | 1 | 2023-01-16 | 2023-01-22 | 874.54 |
4 | 1 | 2023-01-23 | 2023-01-29 | 354.78 |
5 | 1 | 2023-01-30 | 2023-02-05 | 478.65 |
6 | 2 | 2023-01-30 | 2023-02-05 | 1,457.17 |
7 | 3 | 2023-01-30 | 2023-02-05 | 1,105.94 |
8 | 1 | 2023-02-06 | 2023-02-12 | 3,418.95 |
9 | 2 | 2023-02-06 | 2023-02-12 | 1,547.98 |
10 | 3 | 2023-02-06 | 2023-02-12 | 1,549.36 |
11 | 4 | 2023-02-06 | 2023-02-12 | 415.78 |
Here’s the query for creating the table.
Throwing the Error
Let’s attempt to show the freelancers’ first and last names and the number of weekly salaries they have received so far:
SELECT first_name, last_name, COUNT(freelancers_id) AS number_of_paymentsFROM freelancers fLEFT JOIN weekly_salary wsON f.id = ws.freelancers_id;
We LEFT JOIN
the two tables so we can get all the required data.
That data is the first and the last name. Then we use the COUNT() aggregate function on the column freelancers_id
to count how many times this ID appears. The counted number is equal to the number of salaries the freelancer received.
So the salaries count for each freelancer is…an error!
The wording of this PostgreSQL message is quite clear, and it reflects the general rule: the columns listed in SELECT
should appear in GROUP BY
. If they don’t appear in GROUP BY
, then they have to be used in the aggregate function.
As you can see, our query doesn’t have GROUP BY
at all. We’re using the COUNT()
aggregate function and we should define the groups for aggregation, but we didn’t do that.
Fixing the “must appear in the GROUP BY clause or be used in an aggregate function” Error
The usual approach to fixing this error is to simply write all the columns from SELECT
in the GROUP BY
clause. This excludes the columns that are the result of the aggregate function.
In our case, listing the columns first_name
and last_name
in GROUP BY
will fix the error.
SELECT first_name, last_name, COUNT(freelancers_id) AS number_of_paymentsFROM freelancers fLEFT JOIN weekly_salary wsON f.id = ws.freelancers_idGROUP BY first_name, last_name;
The only change from the previous query is that now we use GROUP BY
. In it, we write all the columns from SELECT
except the one used in the aggregate function. Aggregate functions are not allowed in GROUP BY
– this would show a whole other Postgres error message.
first_name | last_name | number_of_payments |
---|---|---|
Miriam | Valetti | 1 |
Claudio | Stratos | 2 |
Nadine | Lopez | 2 |
Pete | Thompson | 6 |
The query returns the above output. It’s obvious we really did fix the error. This output shows Miriam Valetti got paid once, Claudio Stratos twice, and so on.
Bonus: PostgreSQL Optimizer, or Why This Error Doesn’t Always Appear
PostgreSQL uses an optimizer. It tries to “think” and do things you meant but maybe didn’t write explicitly.
The error we discussed won’t appear in PostgreSQL if you group by the primary key.
Take a look at this query:
SELECT f.id, first_name, last_name, SUM(paid_amount) AS total_paid_amountFROM freelancers fLEFT JOIN weekly_salary wsON f.id = ws.freelancers_idWHERE country = 'Italy'GROUP BY f.id;
It attempts to return the freelancers’ ID, first name, last name, and the total salary amount they’ve been paid so far (the SUM()
function). The two tables are LEFT JOINed
and the data is filtered to show only freelancers from Italy.
What seems strange is the GROUP BY
. Didn’t we just say that all the columns in SELECT
must also appear in GROUP BY
?
This code should return an error. Let’s run it and see:
id | first_name | last_name | total_paid_amount |
---|---|---|---|
2 | Nadine | Lopez | 3,005.15 |
3 | Claudio | Stratos | 2,655.30 |
4 | Miriam | Valetti | 415.78 |
Surprisingly, the query doesn’t return an error! Why is that?
The PostgreSQL optimizer allows you to group by primary key (PK) and have the non-PK columns in SELECT
. PostgreSQL understands that the PK columns determine the values of other columns in the same table.
In our case, column f.id
is the PK. Having it in GROUP BY
is enough; no need for including first_name
and last_name
(the non-PK columns) in GROUP BY
.
However, you shouldn’t rely on that – the optimizer is not that smart! For instance, it can’t identify PKs for subqueries, CTEs, and views.
Let’s rewrite the above query with a subquery. The table you use might be much bigger, so you might want to filter only the freelancers from Italy immediately. You’ll do this in a subquery:
SELECT ifr.id, first_name, last_name, SUM(paid_amount) AS total_paid_amountFROM (SELECT *FROM freelancersWHERE country = 'Italy') AS ifr LEFT JOIN weekly_salary wsON ifr.id = ws.freelancers_idGROUP BY ifr.id;
The SELECT
part is the same as earlier. But instead of using the whole freelancers
table and joining it with weekly_salary
, we write the ifr
(as in ‘italy_freelancers’) subquery.
This subquery selects all the columns from the table freelancers
and filters data by country in WHERE
.
As we use the subquery instead of the whole freelancers
table, we group the output by the column id
from the subquery.
This should work, as the subquery is basically just a limited version of the freelancers
table. However, PostgreSQL returns a familiar error:
We used this derived table as a subquery, so the PostgreSQL optimizer doesn’t recognize its PK. If you insist on having a subquery, then this error is fixed the same way as before: list all other columns from SELECT
in GROUP BY
.
SELECT ifr.id, first_name, last_name, SUM(paid_amount) AS total_paid_amountFROM (SELECT *FROM freelancersWHERE country = 'Italy') AS ifr LEFT JOIN weekly_salary wsON ifr.id = ws.freelancers_idGROUP BY ifr.id, first_name, last_name;
The query now works:
id | first_name | last_name | total_paid_amount |
---|---|---|---|
2 | Nadine | Lopez | 3,005.15 |
3 | Claudio | Stratos | 2,655.30 |
4 | Miriam | Valetti | 415.78 |
We About Learned Fixing Errors. Now It’s Time to Avoid Them.
Getting such error messages in PostgreSQL is very helpful, as they make you learn. Of course, even the most advanced PostgreSQL users will now and then see this (or any other) mistake, so you won’t avoid them completely.
It’s important that you know how to fix the errors. However, the goal is to see this GROUP BY error as rarely as possible. Of course, sometimes the PostgreSQL optimizer might save you. It’s a good tool to have, but you should rely more on your knowledge than the optimizer being able to read your mind. To ensure that, you need a lot of practice to make writing the columns in SELECT and then in GROUP BY automatic.
The SQL from A to Z in PostgreSQL will boost your knowledge and confidence in it.
There are also some additional ideas on how to practice PostgreSQL online. Use this to become your own code optimizer!