0

I have a table with the following structure (header):

ProjID,Cost2001,Cost2002,Cost2003

So for example the first two rows look like so:

projectA,10,32,30
projectB,42,22,122

I would like to transform this table into the following structure (header):

ProjID,CostYear,Value

So that coming back to the sample data, post the transformation, it would look like so:

ProjectA,Cost2001,10
ProjectA,Cost2002,32
ProjectA,Cost2003,30
ProjectB,Cost2001,42
ProjectB,Cost2002,22
ProjectB,Cost2003,122

How do I do this? I'm using Google BigQuery, which supports Standard SQL. I only need to do this one time to fix the table, so I don't mind importing the data to another RDBMS in order to be able to use pivot functionality.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
orcaman
  • 6,263
  • 8
  • 54
  • 69
  • 1
    please tag right RDBMS, `Mysql or T-sql` ? – Ven Mar 05 '18 at 15:31
  • This functionality is commonly called `UNPIVOT`. How to do this depends on your database (actual `UNPIVOT` function or a `UNION`). – JNevill Mar 05 '18 at 15:34
  • Ideally once you get your data in this format you fix your tables so they are permanently like this. What you have violates 1NF with what is known as repeating groups. It should be rows instead. Ideally you would have a column named CostYear which would only have the year value. As already said you need to decided which DBMS you are using. But regardless you need an UNPIVOT – Sean Lange Mar 05 '18 at 15:34
  • @BHouse I added the info about the RDMBS. Thanks – orcaman Mar 05 '18 at 15:42
  • Maybe this will help you [unpivot google big query](https://stackoverflow.com/questions/27832170/how-to-unpivot-in-bigquery) – hkravitz Mar 05 '18 at 15:42

4 Answers4

3

IMHO the easiest way is using UNION ALL.

CREATE TABLE projects(ProjID VARCHAR(20), Cost2001 int, Cost2002 int, Cost2003 int);
INSERT INTO projects VALUES
('projectA', 10, 32, 30),
('projectB', 42, 22, 122);

CREATE TABLE new_projects (ProjID VARCHAR(20), ProjYear INT, Cost int);
GO
2 rows affected
INSERT INTO new_projects
SELECT ProjID, 2001, Cost2001
FROM   projects
UNION ALL
SELECT ProjID, 2002, Cost2002
FROM   projects
UNION ALL
SELECT ProjID, 2003, Cost2003
FROM   projects;

SELECT * FROM new_projects;
GO
ProjID   | ProjYear | Cost
:------- | -------: | ---:
projectA |     2001 |   10
projectB |     2001 |   42
projectA |     2002 |   32
projectB |     2002 |   22
projectA |     2003 |   30
projectB |     2003 |  122

dbfiddle here

McNets
  • 10,352
  • 3
  • 32
  • 61
3

Below are truly BigQuery style :o)

Both versions for BigQuery Standard SQL

##standardSQL
SELECT 
  projID,  
  ([2001, 2002, 2003])[SAFE_OFFSET(pos)] year, 
  cost
FROM `project.dataset.table`,
UNNEST([Cost2001,Cost2002,Cost2003]) cost WITH OFFSET pos

you can test / play wit above using dummy data from your question as below

##standardSQL
WITH `project.dataset.table` AS (
  SELECT 'projectA' projID, 10 Cost2001, 32 Cost2002, 30 Cost2003 UNION ALL
  SELECT 'projectB', 42, 22, 122 
)
SELECT 
  projID,  
  ([2001, 2002, 2003])[SAFE_OFFSET(pos)] year, 
  cost
FROM `project.dataset.table`,
UNNEST([Cost2001,Cost2002,Cost2003]) cost WITH OFFSET pos   

with result as

Row projID      year    cost     
1   projectA    2001    10   
2   projectA    2002    32   
3   projectA    2003    30   
4   projectB    2001    42   
5   projectB    2002    22   
6   projectB    2003    122    

as you can see in above query you had to pre-set values for respective years in below line

([2001, 2002, 2003])[SAFE_OFFSET(pos)] year   

If for whatever reason you want to be more generic and be able to derive those values off of original columns' names - you can use below generic approach

##standardSQL
WITH `project.dataset.table` AS (
  SELECT 'projectA' projID, 10 Cost2001, 32 Cost2002, 30 Cost2003 UNION ALL
  SELECT 'projectB', 42, 22, 122 
)
SELECT 
  projID,  
  SPLIT(x,':')[SAFE_OFFSET(0)] year,
  SPLIT(x,':')[SAFE_OFFSET(1)] cost 
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', ''))) x
WHERE SPLIT(x,':')[OFFSET(0)] != 'projID'

obviously, with same result

Row projID      year        cost     
1   projectA    Cost2001    10   
2   projectA    Cost2002    32   
3   projectA    Cost2003    30   
4   projectB    Cost2001    42   
5   projectB    Cost2002    22   
6   projectB    Cost2003    122  
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
1

As everyone mentioned, you need UNPIVOT for this, which would look something like:

DECLARE @projects TABLE (projid nvarchar(max), cost2001 int, cost2002 int, cost2003 int);

INSERT @projects VALUES ('projectA', 10, 32, 30)
                    , ('projectB', 42, 22, 122);


SELECT PROJID, PROJECT_ATTRIBUTE, PROJECT_COST
FROM @projects
UNPIVOT (PROJECT_COST FOR PROJECT_ATTRIBUTE in (cost2001, cost2002, cost2003) ) AS UNPVT

I wouldn't go for using the UNION ALL version for performance reasons. Essentially you would be scanning the table 3 times or as many "CostYear" columns you have, plus you have to add a whole new query for that.

As opposed to scanning the table once with a UNPIVOT.

Radu Gheorghiu
  • 20,049
  • 16
  • 72
  • 107
0

Unpivot your data

DECLARE @ProjectTbl TABLE (ProjectID VARCHAR(225),Cost2001 INT, Cost2002 INT,Cost2003  INT)
INSERT INTO @ProjectTbl VALUES
('projectA',10,32,30),
('projectB',42,22,122);

;WITH Unpivots 
AS
(SELECT 
*
FROM @ProjectTbl
UNPIVOT 
(
  Value FOR CostYear IN (Cost2001, Cost2002, Cost2003)
) AS up 
)
SELECT
ProjectID,
CostYear,
Value
FROM Unpivots

OutPut

ProjectID   CostYear    Value
projectA    Cost2001    10
projectA    Cost2002    32
projectA    Cost2003    30
projectB    Cost2001    42
projectB    Cost2002    22
projectB    Cost2003    122
JonWay
  • 1,585
  • 17
  • 37