Search the MySQL manual:

6.3.7 Functions and Modifiers for Use with GROUP BY Clauses

Subsections

User Comments

Posted by wdbrennan on Thursday June 20 2002, @6:28pm[Delete] [Edit]

For medians in almost any SQL:
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)
+1)/2

assumes (COUNT(*)+1)/2 returns an int, if not
then use INT((COUNT(*)+1)/2)
There are variations for the varied defs of
median too.

See Rozenshtein's book on Optimizing T-SQL
Advanced Programming Techniques. Search Amazon.
Learn how to use encoded characteristic functions
(delta[]), then you can do all sorts of fun stuff
quickly - for instance you could pivot a table
into summary columns of totals where each column
counted different things in one table pass.
I have utilized this technique with great success
on large tables: a multi million row table using
the existing summary code took minutes while this
SQL trimmed that down to less than 3 seconds!
Once you get used to Delta functions, the code is
very readable too!



Posted by Ondra Zizka on Wednesday December 18 2002, @5:27pm[Delete] [Edit]

Another GROUP functions NEEDED!!! There's no way
to pick some value from grouped column, accordign
to another column. E.g., the newest value, acc.
to 'mydate' column, what is INSERTed as NOW().
Somebody should develop something like
SELECT user, DECICE(val,when = MAX(when)) AS
newest_val GROUP BY user;
DECIDE would be a group-usable fuction, what
would select the value from the column set in the
first arg from the row where the expression in
the 2nd arg is true. Who agrees, make the | :)
|

Posted by hudson on Friday May 17 2002, @6:24am[Delete] [Edit]

For what it's worth, the SUM() function does not
work on TIME values directly, although it appears
to work at first glance. To add a bunch of TIME
values (intervals, presumably), you need to do
something like this:

SEC_TO_TIME(SUM(TIME_TO_SEC(events.length)))

This will return another TIME value, suitable for
TIME_FORMAT() or whatever else.

Posted by Linda Wright on Friday May 17 2002, @6:24am[Delete] [Edit]

A function for doing Base64 encoding would be a
good thing to have. This can be used with
encryption, but also, binary files can be
converted to a convenient universal character
format and stored neatly as text.

Posted by Christian Sonntag on Thursday July 4 2002, @12:44pm[Delete] [Edit]

I would like to see a GROUP BY function 'ANY',
which just returns any of the values and optimises
the query instead. This could be quite useful.

Posted by Adam Hooper on Friday May 17 2002, @6:24am[Delete] [Edit]

I haven't tested this much, but I can say for
sure that in at least one single-table query
using SQL_CALC_FOUND_ROWS/FOUND_ROWS() is MUCH
slower than running the query with a LIMIT and
then running it again with a COUNT(*). This
particular query is run from PHP and returns
16720 rows with a LIMIT of 25. Running it with
two queries (i.e., one with LIMIT and the other
with COUNT(*)) gives a query time of around
450ms, while running it with SQL_CALC_FOUND_ROWS
(LIMIT 25)/FOUND_ROWS() takes slightly over
600ms. Pentium II 400, 512MB, InnoDB....

Posted by Aubrey McAuley on Thursday December 19 2002, @3:22pm[Delete] [Edit]

I agree that the limitations of GROUP BY when
used along with ORDER BY are very frustrating.
They've caused me quite a lot of grief. I think
the standard way to get around this limitation
is with a sub-select, which MySQL doesn't
support. However, there is a way around this
problem that's already documented:

http://www.mysql.com/doc/en/example-Maximum-column-group-row.html

I don't know why it's not mentioned here. Even
though it's not a simple solution, it does work.
If you sort things the way you need them while
populating the temporary table (without GROUP
BY), then subsequent queries that join the
temporary table will provide suitable results.
Thus, the GROUP BY can in a round-about way be
used with ORDER BY.

Posted by Csaba Gabor on Wednesday December 18 2002, @5:28pm[Delete] [Edit]

NULL handling seems to be a bit iffy in the
HAVING clause, which you might compensate for by
using IFNULL:
Suppose you have a Users table with column
(of unique) Id and a Logins table with Id,
LastUse colums such that Id's in Logins may be
repeated or might not appear at all. Then if you
wanted to know only about users not active in the
last three weeks you might think to write:
SELECT Users.Id, UNIX_TIMESTAMP()-UNIX_TIMESTAMP
(MAX(LastUse)) AS Diff FROM Users LEFT JOIN
Logins USING (Id) GROUP BY Users.Id HAVING
(Diff IS NULL) OR (Diff>504*3600);
which will give you nothing. Solve
the problem by using IFNULL in either the SELECT
or HAVING clause:
SELECT Users.Id, IFNULL(UNIX_TIMESTAMP()-
UNIX_TIMESTAMP(MAX(LastUse)),0) AS Diff FROM
Users LEFT JOIN Logins USING (Id) GROUP BY
Users.Id HAVING (Diff=0 OR Diff>504*3600);

Posted by Alan McKay on Friday June 14 2002, @9:45am[Delete] [Edit]

I see there are functions for Min, Max, Average
and Standard Deviation, but what about Median? Is
there some way to do that?

Posted by [name withheld] on Wednesday December 18 2002, @5:27pm[Delete] [Edit]

<a href="mailto:kangus@acats.com?
Subject=DUPLICATE"> Kevin A. </a>
<font color='red'>DUPLICATE RECORDS How to
find
them</font>
While DISTINCT works for finding duplicates <font
color='red'> IN ONE COLUMN
</front><font
color='black'>it does not
allow you to add other information to your query,
try:
CREATE TABLE Distinct_Address AS SELECT
StreetAddress, COUNT(StreetAddress)
AS
HouseHoldCount,First,Middle,Last,StreetAddress,City,
State,Zip,Phone
FROM Voters2002
GROUP BY StreetAddress
HAVING COUNT(Voters2002.StreetAddress)
>=1;
This will create a table with a HOUSE HOLD count
and only ONE address. This can be used as a mailing
list where you only want to send ONE piece of mail
to a house hold.</font>

Posted by Pablo Díaz on Wednesday December 18 2002, @5:28pm[Delete] [Edit]

FINDIND DUPLICATED RECORDS
------------------------------------------------
Well, I don't know toomuch about this but it works
for me, and fast enough.
SELECT columnname, COUNT(columnname)
FROM
tablename GROUP BY columnname HAVING COUNT
(columnname)>1
Bye community!

Posted by Anthony DiPierro on Sunday September 29 2002, @3:35pm[Delete] [Edit]

To select the latest 5 unique topics, using a datetime
field called time, use something like: "select topic,
max(time) as m from nuke_stories group by topic
order by m desc limit 5;"

Posted by Barbara Kreaseck on Thursday October 3 2002, @11:53pm[Delete] [Edit]

Another take on MEDIAN: The SELECT statement
listed in a comment above does not work for all
lists of values (especially when the median value
is not unique in the list, or when the median
should be the average of the two middle-most
values in an even-sized list). Here is a
three-step process that has worked for me:

CREATE TEMPORARY TABLE medians SELECT x.val FROM
data x, data y GROUP BY x.val HAVING
((COUNT(*)-(SUM(SIGN(1-SIGN(y.val-x.val))))) <=
floor((COUNT(*) +1)/2)) and
((COUNT(*)-(SUM(SIGN(1+SIGN(y.val-x.val))))) <=
floor((COUNT(*) +1)/2));

The resulting medians table will either have 1 or
2 rows. If the size of your original data is an
odd number or the two middle-most data values are
the same there will
only be 1 row. If the size of
your original data is an even number and the
middle data values are not equal, there will be 2
rows. Take the average to get the true median:

SELECT AVG(val) AS median FROM medians;

The last step is to drop your temporary table
medians:

DROP TABLE medians;

I couldn't think of another way to compute the
MEDIAN without using a temporary table. If you
know how to optimize my technique, I would love to
hear from you.

Posted by mysql on Sunday October 6 2002, @12:52am[Delete] [Edit]

Is it just me that finds the (lacking) description
of GROUP BY hard to follow?

> MySQL has extended the use of GROUP BY.
(okay)
> You can use columns or calculations in the
SELECT expressions that don't appear in the GROUP
BY part.
(fine)
> This stands for any possible value for this group.
(what?)

> Don't use this feature if the columns you omit
from the GROUP BY part aren't unique in the group!
You will get unpredictable results.

What does this last bit mean? "aren't unique in
the group" .. should that not read "aren't
IDENTICAL in the group"???

Good feature though, the ANSI GROUP BY is too
redundant, and limits you to 10 GROUP columns,
which while high, is still a limit.

Posted by Timo Boehme on Tuesday April 8 2003, @6:19am[Delete] [Edit]

It seems that MySQL 4.0.12 cannot handle
HAVING COUNT(DISTINCT col1)>=5 correctly because it returns 'Unknown column col1 ...', however
HAVING COUNT(col1)>=5 is ok.
Fortunately one can get the intended behaviour with
SELECT ... , COUNT(DISTINCT col1) AS cnt
...
GROUP BY ...
HAVING cnt>=5

Posted by [name withheld] on Wednesday April 23 2003, @9:56am[Delete] [Edit]

STDDEV is a BIASED estimate of standard deviation. Have a look at http://mathworld.wolfram.com/Variance.html to find out more about biased and unbiased estimates of standard deviation and varience.

It would be cool to have a function to calculate the unbiased version of the standard deviation too, though :)

Posted by Jeff Gage on Tuesday May 6 2003, @8:05am[Delete] [Edit]

Just a suggestion for the developers to add a non-distinct function, which is something I am always using a perl script to 'grep out' deletes from tables with. Having something like:

Delete from some_table where some_column is not distinct.

That would leave a single occurance and get rid of the duplicates.


I do it alot. I have a hunch others do as well.


Jeff

Posted by Paul Fernando on Thursday June 12 2003, @4:48am[Delete] [Edit]

What is the equivalent funtion for Decode used in oracle compared
Mysql

Posted by Paul Fernando on Thursday June 12 2003, @4:49am[Delete] [Edit]

What is the equivalent funtion for Decode used in oracle that can be used in Mysql - Reply if any clue available immedietely

Posted by Alberto Lepe on Thursday June 12 2003, @1:09pm[Delete] [Edit]

I was trying to get the percentage of a number rows contained in a table in one query. Normally you could do this with 2 queries:

SELECT COUNT(*) as total FROM table WHERE poll=1231 and question=2
SELECT COUNT(*) as result FROM table WHERE poll=1231 and question=2 and answer = 1

and then you get the percentage getting (result/total)*100

I looked over and over all the MySql documentation, and finally I tried this and works (thanks to Tim Ferrer):

SELECT (COUNT(DISTINCT T2.id)/COUNT(DISTINCT T1.id)*100) as percentage FROM table T1, table T2 WHERE T1.poll=1231 and T1.question=2 and T2.poll=1231 and T2.question=2 and T2.answer=1;

That works for me.

Posted by Bob Terrell on Sunday July 6 2003, @1:36am[Delete] [Edit]

If you want to perform a GROUP BY, but you want values in rows that you aren't grouping by and don't appear first in the table, you can do so by joining the table to itself.

For example, in a table that has the fields "tile_id", "name", "orientation" and "imagename":
1) I want the fields "tile_id", "name" and "imagename" to be returned.
2) I want to group by the "name".
3) I want the rows with an orientation of "up" to show up.
4) I only want rows in which there is more than one "orientation" for a given "name".

You can do this with the following query:

SELECT t1.tile_id, t1.name, t1.imagename, count(*) as thecount
FROM `tiles` AS t1, tiles AS t2
WHERE t1.orientation != t2.orientation AND t1.orientation='up' AND t1.name = t2.name
GROUP BY t1.name

Because it's joining where the two tables' orientations are not equal AND where the names ARE equal, that automatically finds results in which there is more than one.

The SAME QUERY could be written as follows:

SELECT t1.tile_id, t1.name, t1.imagename, count(*) as thecount FROM `tiles` AS t1, tiles AS t2 WHERE t1.orientation='up' AND t1.name = t2.name GROUP BY t1.name HAVING thecount > 1

If you aren't picky and don't mind resuts with only one orientation, of course, you can simply remove the HAVING clause from the example above.

Add your own comment.