Given this data set:
ID Name City Birthyear 1 Egon Spengler New York 1957 2 Mac Taylor New York 1955 3 Sarah Connor Los Angeles 1959 4 Jean-Luc Picard La Barre 2305 5 Ellen Ripley Nostromo 2092 6 James T. Kirk Riverside 2233 7 Henry Jones Chicago 1899
I need to find the 3 oldest persons, but only one of every city.
If it would just be the three oldest, it would be…
- Henry Jones / Chicago
- Mac Taylor / New York
- Egon Spengler / New York
However since both Egon Spengler and Mac Taylor are located in New York, Egon Spengler would drop out and the next one (Sarah Connor / Los Angeles) would come in instead.
Any elegant solutions?
Currently a variation of PConroy is the best/fastest solution:
SELECT P.*, COUNT(*) AS ct FROM people P JOIN (SELECT MIN(Birthyear) AS Birthyear FROM people GROUP by City) P2 ON P2.Birthyear = P.Birthyear GROUP BY P.City ORDER BY P.Birthyear ASC LIMIT 10;
His original query with “IN” is extremly slow with big datasets (aborted after 5 minutes), but moving the subquery to a JOIN will speed it up a lot. It took about 0.15 seconds for approx. 1 mio rows in my test environment. I have an index on “City, Birthyear” and a second one just on “Birthyear”.
Note: This is related to…
Probably not the most elegant of solutions, and the performance of
IN may suffer on larger tables.
The nested query gets the minimum
Birthyear for each city. Only records who have this
Birthyear are matched in the outer query. Ordering by age then limiting to 3 results gets you the 3 oldest people who are also the oldest in their city (Egon Spengler drops out..)
SELECT Name, City, Birthyear, COUNT(*) AS ct FROM table WHERE Birthyear IN (SELECT MIN(Birthyear) FROM table GROUP by City) GROUP BY City ORDER BY Birthyear DESC LIMIT 3; +-----------------+-------------+------+----+ | name | city | year | ct | +-----------------+-------------+------+----+ | Henry Jones | Chicago | 1899 | 1 | | Mac Taylor | New York | 1955 | 1 | | Sarah Connor | Los Angeles | 1959 | 1 | +-----------------+-------------+------+----+
Edit – added
GROUP BY City to outer query, as people with same birth years would return multiple values. Grouping on the outer query ensures that only one result will be returned per city, if more than one person has that minimum
ct column will show if more than one person exists in the city with that
This is probably not the most elegant and quickest solution, but it should work. I am looking forward the see the solutions of real database gurus.
select p.* from people p, (select city, max(age) as mage from people group by city) t where p.city = t.city and p.age = t.mage order by p.age desc
Something like that?
SELECT Id, Name, City, Birthyear FROM TheTable WHERE Id IN (SELECT TOP 1 Id FROM TheTable i WHERE i.City = TheTable.City ORDER BY Birthyear)
Not pretty but should work also with multiple people with the same dob:
select id, name, city, dob into people from (select 1 id,'Egon Spengler' name, 'New York' city , 1957 dob union all select 2, 'Mac Taylor','New York', 1955 union all select 3, 'Sarah Connor','Los Angeles', 1959 union all select 4, 'Jean-Luc Picard','La Barre', 2305 union all select 5, 'Ellen Ripley','Nostromo', 2092 union all select 6, 'James T. Kirk','Riverside', 2233 union all select 7, 'Henry Jones','Chicago', 1899 union all select 8, 'Blah','New York', 1955) a
select * from people p left join people p1 ON p.city = p1.city and (p.dob > p1.dob and p.id <> p1.id) or (p.dob = p1.dob and p.id > p1.id) where p1.id is null order by p.dob
just found that its good to use USING instead of ON. it will remove duplicate columns in result.
SELECT P.*, COUNT(*) AS ct FROM people P JOIN (SELECT City, MIN(Birthyear) AS Birthyear FROM people GROUP by City) P2 USING(Birthyear, City) GROUP BY P.City ORDER BY P.Birthyear ASC LIMIT 10;
hi, i’ve tried to use your updated query but i was getting wrong results until i’ve added extra condition to join (also extra column into join select). transfered to your query, i’am using this:
SELECT P.*, COUNT(*) AS ct FROM people P JOIN (SELECT City, MIN(Birthyear) AS Birthyear FROM people GROUP by City) P2 ON P2.Birthyear = P.Birthyear AND P2.City = P.City GROUP BY P.City ORDER BY P.Birthyear ASC LIMIT 10;
in theory you should not need last GROUP BY P.City, but i’ve left it there for now, just in case. will probably remove it later.