3

I am trying to restructure my big data set so I can process my data in a easier way. I have about 20 tables with the same data structure as the displayed input table. There is one for each year from 1996 to 2015.

This is one of my input tables (mytable2015)

cell   day1      day2      day3      day4    ......   day365
1      3,7167    0         0         0,1487  ......   0,3256
2      0         0         0,2331    0,1461  ......   1,8765
3      1,431     0,4121    0         1,4321  ......   0
...
...
...
64800

I would like to have all the data in one big dataset and, if it is possible, day1, day2, ... replaced by a real date value (e.g. 01.01.2015 or 20150101) So my result should look something like this:

cell   date      value
1      20150101  3,7167
1      20150102  0
1      20150103  0
1      20150104  0,1487
...    ........  ......
...    ........  ......
...    ........  ......
2      20150101  0
2      20150102  0,4321
...    ........  ......
...    ........  ......
...    ........  ......
64800  20150101  0,1035

Cell represents a geographical information. They represent a grid spawned across the world, each cell is exactly one degree high and one degree long.

I have two main questions:

  1. Is it possible to convert sth like day1, day2,... into a date format?

  2. How do I convert my table into this new structure?

Any help is highly appreciated, thanks in advance!

marius
  • 45
  • 6

2 Answers2

3

Query

Example data:

create table example2015 (cell int, day1 real, day2 real, day3 real, day4 real);
insert into example2015 values
(1,      3.7167,    0,         0,         0.1487),  
(2,      0,         0,         0.2331,    0.1461),  
(3,      1.431,     0.4121,    0,         1.4321);  

Step by step how to build the query.

Step 1. Use json_each(row_to_json(t)) to aggregate and unnest the columns:

select cell, json_each_text(row_to_json(t)) val
from example2015 t

 cell |      val      
------+---------------
    1 | (cell,1)
    1 | (day1,3.7167)
    1 | (day2,0)
    1 | (day3,0)
    1 | (day4,0.1487)
    2 | (cell,2)
    2 | (day1,0)
    2 | (day2,0)
    2 | (day3,0.2331)
    2 | (day4,0.1461)
    3 | (cell,3)
    3 | (day1,1.431)
    3 | (day2,0.4121)
    3 | (day3,0)
    3 | (day4,1.4321)
(15 rows)   

Step 2. Skip cell pair, convert dayn to integer n and add to the base date (here 2014-12-31):

select cell, '2014-12-31'::date+ ltrim((val).key, 'day')::int "date", (val).value::real
from (
    select cell, json_each_text(row_to_json(t)) val
    from example2015 t
    ) sub
where (val).key <> 'cell'

 cell |    date    | value  
------+------------+--------
    1 | 2015-01-01 | 3.7167
    1 | 2015-01-02 |      0
    1 | 2015-01-03 |      0
    1 | 2015-01-04 | 0.1487
    2 | 2015-01-01 |      0
    2 | 2015-01-02 |      0
    2 | 2015-01-03 | 0.2331
    2 | 2015-01-04 | 0.1461
    3 | 2015-01-01 |  1.431
    3 | 2015-01-02 | 0.4121
    3 | 2015-01-03 |      0
    3 | 2015-01-04 | 1.4321
(12 rows)

Conversion

You can use the query from step 2 to insert values from mytable2015 to result_table:

create table result_table (
    "cell" integer,
    "date" date,
    "value" real
);

You are going to generate a table with 23,652,000 rows. It is likely that a one-time conversion will exhaust memory resources and may take longer than you can accept. I would suggest to divide the operation into several stages, lets say, up to 10,000 source rows (3,650,000 new rows) at a time.

insert into result_table
select cell, '2014-12-31'::date+ ltrim((val).key, 'day')::int "date", (val).value::real
from (
    select cell, json_each_text(row_to_json(t)) val
    from mytable2015 t
    ) sub
where (val).key <> 'cell'
and cell > 0 and cell <= 10000

Repeat the insertion for cell > 10000 and cell <= 20000 and so on.

klin
  • 112,967
  • 15
  • 204
  • 232
2

If the table and column names are consistent you should be able to determine the date of each final row by date arithmetic only requiring a date literal for each table e.g.'2011-01-01' for table mytable2011

The majority of the "unpivot" operation is conducted using JSON, first placing each source row into JSON and then creating rows from that which is shown in stages below.

SQL Fiddle

PostgreSQL 9.3 Schema Setup:

CREATE TABLE MyTable2011
    ("cell" int, "day1" numeric, "day2" numeric, "day3" int, "day4" numeric, "day365" int)
//

INSERT INTO MyTable2011
    ("cell", "day1", "day2", "day3", "day4", "day365")
VALUES
    (1, 3.7167, 0.00, 0.00, 0.1487, 0.3256),
    (2, 0, 0, 0.2331, 0.1461, 1.8765),
    (3, 1.431, 0.4121, 0, 1.4321, 0.00)
//

Query 1:

SELECT row_to_json(MyTable2011) as jstring FROM MyTable2011

Results:

|                                                                 jstring |
|-------------------------------------------------------------------------|
|  {"cell":1,"day1":3.7167,"day2":0.00,"day3":0,"day4":0.1487,"day365":0} |
|          {"cell":2,"day1":0,"day2":0,"day3":0,"day4":0.1461,"day365":2} |
| {"cell":3,"day1":1.431,"day2":0.4121,"day3":0,"day4":1.4321,"day365":0} |

Query 2:

SELECT
      jstring->>'cell' as cell
    , json_each_text(jstring) as pairs
     FROM (
           SELECT
                row_to_json(MyTable2011) as jstring 
           FROM MyTable2011
          ) as jrows

Results:

| cell |         pairs |
|------|---------------|
|    1 |      (cell,1) |
|    1 | (day1,3.7167) |
|    1 |   (day2,0.00) |
|    1 |      (day3,0) |
|    1 | (day4,0.1487) |
|    1 |    (day365,0) |
|    2 |      (cell,2) |
|    2 |      (day1,0) |
|    2 |      (day2,0) |
|    2 |      (day3,0) |
|    2 | (day4,0.1461) |
|    2 |    (day365,2) |
|    3 |      (cell,3) |
|    3 |  (day1,1.431) |
|    3 | (day2,0.4121) |
|    3 |      (day3,0) |
|    3 | (day4,1.4321) |
|    3 |    (day365,0) |

Query 3:

SELECT
      date '2011-01-01' + CAST(REPLACE((pairs).key,'day','') as integer) -1 as thedate
    , CAST(REPLACE((pairs).key,'day','') as integer) as daynum
    , cell
    , (pairs).value as thevalue 
FROM (
      SELECT
            jstring->>'cell' as cell
          , json_each_text(jstring) as pairs
     FROM (
           SELECT
                row_to_json(MyTable2011) as jstring 
           FROM MyTable2011
          ) as jrows
     ) as unpiv
WHERE (pairs).key <> 'cell'

Results:

|                    thedate | daynum | cell | thevalue |
|----------------------------|--------|------|----------|
|  January, 01 2011 00:00:00 |      1 |    1 |   3.7167 |
|  January, 02 2011 00:00:00 |      2 |    1 |     0.00 |
|  January, 03 2011 00:00:00 |      3 |    1 |        0 |
|  January, 04 2011 00:00:00 |      4 |    1 |   0.1487 |
| December, 31 2011 00:00:00 |    365 |    1 |        0 |
|  January, 01 2011 00:00:00 |      1 |    2 |        0 |
|  January, 02 2011 00:00:00 |      2 |    2 |        0 |
|  January, 03 2011 00:00:00 |      3 |    2 |        0 |
|  January, 04 2011 00:00:00 |      4 |    2 |   0.1461 |
| December, 31 2011 00:00:00 |    365 |    2 |        2 |
|  January, 01 2011 00:00:00 |      1 |    3 |    1.431 |
|  January, 02 2011 00:00:00 |      2 |    3 |   0.4121 |
|  January, 03 2011 00:00:00 |      3 |    3 |        0 |
|  January, 04 2011 00:00:00 |      4 |    3 |   1.4321 |
| December, 31 2011 00:00:00 |    365 |    3 |        0 |
Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51