Surprise — this is a perfectly valid query in MySQL:
select X, Y from someTable group by X
If you tried this query in Oracle or SQL Server, you’d get the natural error message:
Column 'Y' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
So how does MySQL determine which Y to show for each X? It just picks one. From what I can tell, it just picks the first Y it finds. The rationale being, if Y is neither an aggregate function nor in the group by clause, then specifying “select Y” in your query makes no sense to begin with. Therefore, I as the database engine will return whatever I want, and you’ll like it.
There’s even a MySQL configuration parameter to turn off this “looseness”.
This article even mentions how MySQL has been criticized for being ANSI-SQL non-compliant in this regard.
My question is: Why was MySQL designed this way? What was their rationale for breaking with ANSI-SQL?
I believe that it was to handle the case where grouping by one field would imply other fields are also being grouped:
SELECT user.id, user.name, COUNT(post.*) AS posts FROM user LEFT OUTER JOIN post ON post.owner_id=user.id GROUP BY user.id
In this case the user.name will always be unique per user.id, so there is convenience in not requiring the user.name in the
GROUP BY clause (although, as you say, there is definite scope for problems)
According to this page (the 5.0 online manual), it’s for better performance and user convenience.
Unfortunately almost all the SQL varieties have situations where they break ANSI and have unpredictable results.
It sounds to me like they intended it to be treated like the “FIRST(Y)” function that many other systems have.
More than likely, this construct is something that the MySQL team regret, but don’t want to stop supporting because of the number of applications that would break.
MySQL treats this is a single column DISTINCT when you use GROUP BY without an aggregate function. Using other options you either have the whole result be distinct, or have to use subqueries, etc. The question is whether the results are truly predictable.
Also, good info is in this thread.
From what I have read in the mysql reference page, it says:
“You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group.”
I suggest you to read this page (link to the reference manual of mysql):
Its actually a very useful tool that all other fields dont have to be in an aggregate function when you group by a field. You can manipulate the result which will be returned by simply ordering it first and then grouping it after. for instance if i wanted to get user login information and i wanted to see the last time the user logged in i would do this.
USER user_id | name USER_LOGIN_HISTORY user_id | date_logged_in
USER_LOGIN_HISTORY has multiple rows for one user so if i joined users to it it would return many rows. as i am only interested in the last entry i would do this
select user_id, name, date_logged_in from( select u.user_id, u.name, ulh.date_logged_in from users as u join user_login_history as ulh on u.user_id = ulh.user_id where u.user_id = 1234 order by ulh.date_logged_in desc )as table1 group by user_id
This would return one row with the name of the user and the last time that user logged in.