5

I'm facing a problem in a data importation script in SQL(MySQL) where I need to GROUP rows by type to COUNT how much rows there are from each type. So far, it isn't really a problem, because I know that I can do:

SELECT 
  data.type, 
  COUNT(data.type) 
FROM data 
GROUP BY data.type;

So, by doing it, I have the result:

 -------------- --------------------- 
| type         | COUNT(data.type)    |
|--------------|---------------------|
| 0            |                   1 |
| 1            |                  46 |
| 2            |                  35 |
| 3            |                 423 |
| 4            |                  64 |
| 5            |                  36 |
| 9            |                   1 |
 -------------- ---------------------

I know that in the type column the values will always be in the range from 0 to 9, like the above result. So, I would like to list not only the existing values in the table content but the missing type values too, with their COUNT value set to 0.

Based on the above query result, the expected result would be:

 -------------- --------------------- 
| type         | COUNT(data.type)    |
|--------------|---------------------|
| 0            |                   1 |
| 1            |                  46 |
| 2            |                  35 |
| 3            |                 423 |
| 4            |                  64 |
| 5            |                  36 |
| 6            |                   0 |
| 7            |                   0 |
| 8            |                   0 |
| 9            |                   1 |
 -------------- --------------------- 

I could trickly INSERT one row of each type before GROUP/COUNT-1 the table content, flagging some other column on INSERT to be able to DELETE these rows after. So, the steps of my importation script would change to:

  1. TRUNCATE table; (I can't securily import new content if there were old data in the table)
  2. INSERT "control" rows;
  3. LOAD DATA INFILE INTO TABLE;
  4. GROUP/COUNT-1 the table content;
  5. DELETE "control" rows; (So I can still work with the table content)
  6. Do any other jobs;

But, I was looking for a cleaner way to reach the expected result. If possible, a single query, without a bunch of JOINs.

I would appreciate any suggestion or advice. Thank you very much!

EDIT

I would like to thank for the answers about CREATE a table to store all types to JOIN it. It really solves the problem. My approach solves it too, but does it storing the types, as you did.

So, I have "another" question, just a clarification, based on the received answers and my desired scope... is it possible to reach the expected result with some MySQL command that will not CREATE a new table and/or INSERT these types?

I don't see any problem, actually, in solve my question storing the types... I just would like to find a simplified command... something like a 'best practice'... some kind of filter... as I could run:

GROUP BY data.type(0,1,2,3,4,5,6,7,8,9)

and it could return these filtered values.

I am really interested to learn such a command, if it really exists/is possible.

And again, thank you very much!

Community
  • 1
  • 1
Felypp Oliveira
  • 2,147
  • 1
  • 17
  • 17
  • The best practice is to have a table called `types` that contains all the valid values for the type (along with other information such as what the values mean). – Gordon Linoff Jan 15 '13 at 01:41
  • Please don't edit answers into your question. If you have an answer, post it below as an answer. – user229044 Dec 22 '15 at 23:39
  • +1 for a good question. I once had a similar issue where I needed to create a time sheet showing things that happened at 9am, 10am, 11am, and so on. It was this exact same issue though where if nothing happened at those times, you had holes in your time sheet. – Laurence Frost Dec 23 '15 at 12:32
  • 1
    @meagar I've posted the answer... thanks! – Felypp Oliveira Dec 23 '15 at 16:32

4 Answers4

0

Let's assume that you have a types table with all the valid types:

SELECT t.type, 
       COUNT(data.type) 
FROM data join types t on data.type = t.type
GROUP BY t.type
order by t.type

You should include the explicit order by and not depend on the group by to produce results in a particular order.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • First of all, thanks for the help. I think that through this approach it will just be INSERTing the "control" rows in a different table, don't you think? My main intention was to find something like: GROUP BY data.type(0,1,2,3,4,5,6,7,8,9) as these values were a 'group by and value filter' – Felypp Oliveira Jan 14 '13 at 22:40
  • Your query doesn't say anything about a value filter. You can create the list of values using `select` statements with `union all`. – Gordon Linoff Jan 14 '13 at 22:46
  • Sorry about the "filter" part... I was not expecting that the answers were about to store the 'types', as my approach store them and I was trying to avoid it. I'm sorry. – Felypp Oliveira Jan 14 '13 at 22:59
  • @user1977836 . . . All three answers are essentially the same. Mine assumes you have a types table. BlueFeet creates one using `union all` and sgeddes assumes you have numbers table. If these don't answer your question, then you may need to revise the question or ask another one. – Gordon Linoff Jan 14 '13 at 23:02
0

The easiest way is to create a table of all type values and then join on that table when getting the count:

select t.type,
  count(d.type)
from types t
left join data d
  on t.type = d.type
group by t.type

See SQL Fiddle with demo

Or you can use the following:

select t.type,
  count(d.type)
from
(
  select 0 type
  union all
  select 1 
  union all
  select 2
  union all
  select 3
  union all
  select 4
  union all
  select 5 
  union all
  select 6
  union all
  select 7
  union all
  select 8
  union all
  select 9 
) t
left join data d
  on t.type = d.type
group by t.type

See SQL Fiddle with Demo

Taryn
  • 242,637
  • 56
  • 362
  • 405
  • First of all, thanks for the help. I expected to find something like: GROUP BY data.type(0,1,2,3,4,5,6,7,8,9) as these values were a 'group by and value filter' because seems that I have no other way than INSERT "control" rows, with each type, in some table. By UNION I'll basically CREATE/INSERT, as far as I understand how it works, but seems a better approach, because MySQL takes care of the CREATE/INSERT/DROP queries, instead of me. I'll wait a bit more to check any other suggestions. Thanks! – Felypp Oliveira Jan 14 '13 at 22:53
  • @FelyppOliveira the only way for you to return values that do not exists in your table is to create them, either with a table of numbers through the `union all` version. – Taryn Jan 14 '13 at 23:20
  • @Felypp I am not sure I understand your update to the question. You can easily use the union all version of the queries which doesn't insert or create a new table of data. – Taryn Jan 15 '13 at 00:36
0

One option would be having a static numbers table with the values 0-9. Not sure if this is the most elegant approach, and if you were using SQL Server, I could think of another approach.

Try something like this:

SELECT 
  numbers.number, 
  COUNT(data.type) 
FROM numbers 
left join data 
  on numbers.number = data.type
GROUP BY numbers.number;

And the SQL Fiddle.

wickedone
  • 542
  • 1
  • 6
  • 18
sgeddes
  • 62,311
  • 6
  • 61
  • 83
  • First of all, thanks for the help. I think that through this approach it will just be INSERTing the "control" rows in a different table, don't you think? My main intention was to find something like: GROUP BY data.type(0,1,2,3,4,5,6,7,8,9) as these values were a 'group by and value filter' – Felypp Oliveira Jan 14 '13 at 22:43
0

Okay... I think I found it! Thank you all!!! I'm accepting my own answer.

I agree with the @GordonLinoff comment that the best practice refers to store the types values and describe them, so you can keep a concise/understandable database and queries.

But, as far as I've learned, if you have some data which might be an irrelevant information, it is preferable to treat it in some other way than storing it.

So, I developed this query:

SELECT 
  SUM(IF(data.type = 0, 1, 0)) AS `0`, 
  SUM(IF(data.type = 1, 1, 0)) AS `1`, 
  SUM(IF(data.type = 2, 1, 0)) AS `2`, 
  SUM(IF(data.type = 3, 1, 0)) AS `3`, 
  SUM(IF(data.type = 4, 1, 0)) AS `4`, 
  SUM(IF(data.type = 5, 1, 0)) AS `5`, 
  SUM(IF(data.type = 6, 1, 0)) AS `6`, 
  SUM(IF(data.type = 7, 1, 0)) AS `7`, 
  SUM(IF(data.type = 8, 1, 0)) AS `8`, 
  SUM(IF(data.type = 9, 1, 0)) AS `9` 
FROM data;

Not a so faster, optimized and beauty query, but to the size of data I'll manage (less than 100.000 rows each importation) it "manually" does the GROUP/COUNT job, running in 0.13 sec in a common developer machine.

It differs from my expected result just in the way rows and columns are selected - instead of 10 rows with 2 columns I've got 1 row with 10 columns, labeled with the matching type. Also, as we have a standardization to the type value (and we'll not change it for sure) which gives it a name and description, I'm now able to use the type name as the column label, instead of joining to a table with the types info to select a third column in the result (which really, is not that important as it's an importation script based on some standards).

Thank you all so much for the help!

Felypp Oliveira
  • 2,147
  • 1
  • 17
  • 17
  • In the future, do not layer on additional "I have a new Question to ask" as seen in your edits. And copy your answer down here – Drew Dec 22 '15 at 18:40
  • @Drew I've copied the answer in this post now and, althrough I've said "... I have another question..." my intention with this "another" question was just to clarify the original one within my desired scope... I'm sorry my bad sentence... – Felypp Oliveira Dec 23 '15 at 16:30