4

Here is a sqlFiddle that shows what I'm trying to do.
Here is @lad2025 sqlFiddle that shows it better

I have two indexes on my table plus a column with the column names and a column with the values.

In the fiddle I show a query that does what I want to do. But it is very slow.

I have a crosstab request that does almost the same thing, very fast, but with few errors. (it will fuse some lines)

SELECT 
    end_user_id, 
    tms, 
    coalesce(max(IN_VEHICLE), 0) as IN_VEHICLE, 
    coalesce(max(ON_BICYCLE), 0) as ON_BICYCLE, 
    coalesce(max(ON_FOOT),    0) as ON_FOOT, 
    coalesce(max(RUNNING),    0) as RUNNING, 
    coalesce(max(STILL),      0) as STILL, 
    coalesce(max(TILTING),    0) as TILTING, 
    coalesce(max(UNKNOWN),    0) as UNKNOWN, 
    coalesce(max(WALKING),    0) as WALKING 
FROM
    crosstab (            
        'SELECT end_user_id, tms, type, max(confidence) FROM activities group by 1,2,3 ',
        'SELECT DISTINCT type FROM activities order by type'
    )as newtable (
        end_user_id text, 
        tms         timestamp,
        IN_VEHICLE  float,
        ON_BICYCLE  float,
        ON_FOOT     float,
        RUNNING     float,
        STILL       float,
        TILTING     float,
        UNKNOWN     float,
        WALKING     float
    )  
GROUP BY end_user_id, tms
ORDER BY end_user_id, tms

I don't know why postgres asks me to GROUP BY end_user_id, tms at the end... It is supposed to be unique.
Also I don't know why, but if I don't group by in the crosstab query, I'll only have one row per end_user_id :(

How can I correct that crosstab request ?

EDIT: @lad2025 response is a better example than mine, more elegant and I'm sure faster. Still, I want to know how to do it with a crosstab.

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
Borbag
  • 597
  • 4
  • 21

1 Answers1

3

You could avoid crosstab/multiple left join as in your Fiddle and use simple conditional aggregation:

SELECT 
 end_user_id,
 tms,
 COALESCE(MAX(CASE WHEN type = 'IN_VEHICLE' THEN confidence END),0) AS IN_VEHICLE,
 COALESCE(MAX(CASE WHEN type = 'ON_BICYCLE' THEN confidence END),0) AS ON_BICYCLE,
 COALESCE(MAX(CASE WHEN type = 'ON_FOOT'    THEN confidence END),0) AS ON_FOOT,
 COALESCE(MAX(CASE WHEN type = 'RUNNING'    THEN confidence END),0) AS RUNNING,
 COALESCE(MAX(CASE WHEN type = 'STILL'      THEN confidence END),0) AS STILL,
 COALESCE(MAX(CASE WHEN type = 'TILTING'    THEN confidence END),0) AS TILTING,
 COALESCE(MAX(CASE WHEN type = 'UNKNOWN'    THEN confidence END),0) AS UNKNOWN,
 COALESCE(MAX(CASE WHEN type = 'WALKING'    THEN confidence END),0) AS WALKING
FROM activities
GROUP BY end_user_id, tms
ORDER BY end_user_id, tms;

SqlFiddleDemo

Output:

╔═══════════════════╦════════════════════════════╦═════════════╦═════════════╦══════════╦══════════╦════════╦══════════╦══════════╦═════════╗
║   end_user_id     ║            tms             ║ in_vehicle  ║ on_bicycle  ║ on_foot  ║ running  ║ still  ║ tilting  ║ unknown  ║ walking ║
╠═══════════════════╬════════════════════════════╬═════════════╬═════════════╬══════════╬══════════╬════════╬══════════╬══════════╬═════════╣
║ 64e8394876a5b7f1  ║ October, 28 2015 08:24:20  ║         21  ║          8  ║       2  ║       0  ║     2  ║       0  ║      68  ║       2 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:24:41  ║         15  ║          0  ║       3  ║       0  ║    72  ║       0  ║      10  ║       3 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:25:17  ║          5  ║          0  ║       5  ║       0  ║    77  ║     100  ║      13  ║       5 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:25:32  ║          0  ║          0  ║       0  ║       0  ║   100  ║       0  ║       0  ║       0 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:25:36  ║          0  ║          0  ║       0  ║       0  ║    92  ║       0  ║       8  ║       0 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:27:24  ║         48  ║         48  ║       0  ║       0  ║     0  ║       0  ║       5  ║       0 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:27:54  ║          0  ║          0  ║       0  ║       0  ║     0  ║     100  ║       0  ║       0 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:28:11  ║         62  ║          8  ║       3  ║       0  ║    15  ║       0  ║      13  ║       3 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:28:53  ║         35  ║          0  ║       6  ║       0  ║    37  ║       0  ║      23  ║       6 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:29:16  ║         54  ║          2  ║       0  ║       0  ║    10  ║       0  ║      35  ║       0 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:24:41  ║         13  ║         13  ║      69  ║       3  ║     0  ║     100  ║       5  ║      67 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:33:33  ║          0  ║          0  ║     100  ║       0  ║     0  ║       0  ║       0  ║     100 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:33:38  ║          0  ║          0  ║     100  ║       0  ║     0  ║       0  ║       0  ║     100 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:34:06  ║         19  ║          6  ║      31  ║       2  ║    29  ║       0  ║      16  ║      29 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:34:34  ║          3  ║          0  ║       0  ║       0  ║    95  ║       0  ║       3  ║       0 ║
╚═══════════════════╩════════════════════════════╩═════════════╩═════════════╩══════════╩══════════╩════════╩══════════╩══════════╩═════════╝

COALESCE is also redundant(if only positive/zero values are allowed):

SELECT 
 end_user_id,
 tms,
 MAX(CASE WHEN type = 'IN_VEHICLE' THEN confidence ELSE 0 END) AS IN_VEHICLE,
 MAX(CASE WHEN type = 'ON_BICYCLE' THEN confidence ELSE 0 END) AS ON_BICYCLE,
 MAX(CASE WHEN type = 'ON_FOOT'    THEN confidence ELSE 0 END) AS ON_FOOT,
 MAX(CASE WHEN type = 'RUNNING'    THEN confidence ELSE 0 END) AS RUNNING,
 MAX(CASE WHEN type = 'STILL'      THEN confidence ELSE 0 END) AS STILL,
 MAX(CASE WHEN type = 'TILTING'    THEN confidence ELSE 0 END) AS TILTING,
 MAX(CASE WHEN type = 'UNKNOWN'    THEN confidence ELSE 0 END) AS UNKNOWN,
 MAX(CASE WHEN type = 'WALKING'    THEN confidence ELSE 0 END) AS WALKING
FROM activities
GROUP BY end_user_id, tms
ORDER BY end_user_id, tms;

SqlFiddleDemo2

You could also consider making lookup table for type column like activities_type (type_id, type_name) instead of storing directly in table string ('IN_VEHICLE', 'ON_BICYCLE', ...).

Addendum

I am not Postgresql Expert but after some playing:

SELECT 
  LEFT(end_user_id, strpos(end_user_id, '_')-1) AS end_user_id,
  RIGHT(end_user_id, LENGTH(end_user_id) - strpos(end_user_id, '_'))::timestamp AS tms,
  COALESCE(IN_VEHICLE,0) AS IN_VEHICLE, 
  COALESCE(ON_BICYCLE,0) AS ON_BICYCLE, 
  COALESCE(ON_FOOT,0)    AS ON_FOOT, 
  COALESCE(RUNNING,0)    AS RUNNING,  
  COALESCE(STILL,0)      AS STILL,
  COALESCE(TILTING,0)    AS TILTING, 
  COALESCE("UNKNOWN",0)  AS "UNKNOWN", 
  COALESCE(WALKING,0)    AS WALKING 
FROM crosstab(
    'SELECT (end_user_id || ''_'' || tms) AS row_id, type, confidence
    FROM activities
    ORDER BY row_id, type, confidence',
    'SELECT DISTINCT type FROM activities order by type'
    ) AS newtable (
            end_user_id text, 
            IN_VEHICLE  int,
            ON_BICYCLE  int,
            ON_FOOT     int,
            RUNNING     int,
            STILL       int,
            TILTING     int,
            "UNKNOWN"   int,
            WALKING     int)  
ORDER BY end_user_id, tms;  

enter image description here

Why concatenate and split end_user_id + tms?

Because crosstab(text,text) needs:

row_id     <=> end_user_id + tms
category   <=> type
value      <=> confidence

Please note that there is no GROUP BY in this version.

Addendum2 - Final version

Based on tablefunc module doc F.37.1.4. crosstab(text, text):

This is much better because it can handle row_id, extra_col1, extra_col2, category, value). So now:

row_id      <=> id
extra_col1  <=> end_user_id
extra_col2  <=> tms
... 

And final query:

SELECT 
    end_user_id, 
    tms,
    coalesce(max(IN_VEHICLE), 0) as IN_VEHICLE, 
    coalesce(max(ON_BICYCLE), 0) as ON_BICYCLE, 
    coalesce(max(ON_FOOT),    0) as ON_FOOT, 
    coalesce(max(RUNNING),    0) as RUNNING, 
    coalesce(max(STILL),      0) as STILL, 
    coalesce(max(TILTING),    0) as TILTING, 
    coalesce(max("UNKNOWN"),  0) as "UNKNOWN", 
    coalesce(max(WALKING),    0) as WALKING 
FROM crosstab(
'SELECT id,end_user_id , tms,  type, confidence
FROM activities',
'SELECT DISTINCT type FROM activities order by type'
) AS newtable (
        id INT,
        end_user_id text, 
        tms         timestamp,
        IN_VEHICLE  int,
        ON_BICYCLE  int,
        ON_FOOT     int,
        RUNNING     int,
        STILL       int,
        TILTING     int,
        "UNKNOWN"   int,
        WALKING     int
    )  
GROUP BY end_user_id, tms    
ORDER BY end_user_id, tms;

enter image description here

What would be the point of the activities_type table ?

Database normalization and you can use:

SELECT DISTINCT type FROM activities order by type
vs
SELECT type_name FROM activities_types ORDER BY type_name;

This version uses id as row_id so it still needs GROUP BY to squash multiple rows.

To sum up: conditional aggregation is most readable solution.

Community
  • 1
  • 1
Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • Hi, tanks for your response! What would be the point of the activities_type table ? Also, conditionnal aggregation is a far better example than mine, my question would still remain the same. Just switch my fiddle with yours: how do I obtain the same result with a crosstab request. The main goal here is two figure out how to use crosstab with multiple indexes. – Borbag Jan 05 '16 at 14:15
  • I can't see the fiddles – Borbag Jan 05 '16 at 14:47
  • @RemiDelassus activities_table is for data normalization (this is just suggestion). It looks like `SQLFiddle` does not work as it should. Timeouts for any query. – Lukasz Szozda Jan 05 '16 at 14:48
  • @RemiDelassus See my updated answer. Please not that I have no prior knowledge about `tablefunc` module and all of this is made based on documentation. You can track all my 4 attempts as my learning process. If anyone could suggest improvement I would be grateful – Lukasz Szozda Jan 05 '16 at 16:47
  • coalesce was useful, in case of negative values. – Borbag Jan 05 '16 at 16:51
  • 1
    COALESCE doesn't do anything to negative values, only NULL values. With the CASE statement and using MAX it shouldn't be possible for any NULL values to exist in the results anyway. – Tom H Jan 05 '16 at 17:23
  • My point is, with the Max statement, any negative value is overridden by 0, not with the coalesce one. – Borbag Jan 05 '16 at 17:42
  • for the crosstab, it works, thanks! I see that you added 'id'. is that why you don't have to 'group by 1,2,3' ? (I'm talking about the first quoted line inside the crosstab request itself) – Borbag Jan 05 '16 at 17:56
  • @lad2025 quick one : is it better to use conditional aggregation instead of `crosstab` or are there occasions when you can crosstab are more fitted? Thanks for your insight – Andy K Jan 05 '16 at 17:58
  • @RemiDelassus Yes you can skip `COALESCE` when there is no negative values. If not, use version with COALESCE. I didn't add group by 1,2,3 because end_user_id and tms should be UNIQUE. – Lukasz Szozda Jan 05 '16 at 18:09
  • @AndyK I like conditional aggragation because it is multiplatform (to some extent at least). `Crosstab` is Postgresql specific and I didn't know about it until today :). In SQL Server/Oracle you have also `PIVOT/UNPIVOT` construct. To sum up, to move rows to columns and columns to rows use construct that are SQL dialect agnostic. – Lukasz Szozda Jan 05 '16 at 18:11