2

Currently I'm using R to transform my table data of the form:

ID    Code    Condition    WT
104   KEENTRAN  CON4      .30577
.     .         .          .
.     .         .          .

The link should work for anyone who would like to download my dataframe, otherwise here's a subset:

>dput(head(df))

structure(list(ID = c(104L, 368L, 10632L, 20385L, 24361L, 34378L
), Code = c("KEENTRAN", "ALEXEXPR", "MINNEXPMN", "JACKMOVWI", 
            "FREICOIN", "JBXEXPGA"), Condition = c("CON4", "CON4", "CON2", 
                                                   "CON2", "CON6", "CON5"), WT = c(0.3057717456, 0.7909870604, 1, 
                                                                                   1, 0.4301040524, 0.5977268575)), .Names = c("ID", "Code", "Condition", 
                                                                                                                               "WT"), class = c("tbl_df", "data.frame"), row.names = c(NA, -6L
                                                                                                                               ))

Background

My example data is in long form where the Condition variable ranges from "CON1" to "CON6", I want to recast my data into wide form, where the ID and Code values will be primary keys and the levels from Condition will be columns whose values take on the maximum value of WT for that particular ID, Code, Condition grouping (or zero if no such pairing exists). This can easily be accomplished in R using the dcast() function from the reshape2 package:

library(reshape2)

Result <- df %>% group_by(ID, Condition) %>% 
  summarise(value = max(as.numeric(WT))) %>% 
  dcast(ID ~ Condition)

Result[is.na(Result)] <- 0

I would like to replicate this data manipulation procedure in SQL Server, but I'm not sure how best to do so. Any help or insight is greatly appreciated.

Rick Arko
  • 680
  • 1
  • 8
  • 27

2 Answers2

2

Consider SQL Server's PIVOT operation:

SELECT t.ID, 
       ISNULL(t.[CON1], 0) AS [CON1],
       ISNULL(t.[CON2], 0) AS [CON2],
       ISNULL(t.[CON3], 0) AS [CON3],
       ISNULL(t.[CON4], 0) AS [CON4],
       ISNULL(t.[CON5], 0) AS [CON5],
       ISNULL(t.[CON6], 0) AS [CON6]    
FROM RDataFrame As r                  
PIVOT 
  (
    MAX(r.[WT]) 
    FOR r.Condition IN ([CON1], [CON2], [CON3], [CON4], [CON5], [CON6])
  ) AS t

-- ID   CON1    CON2    CON3    CON4            CON5    CON6
-- 8    0       0       0       0.4394051665    0       0
-- 10   0       0       0       0.6013843825    0       0
-- 15   0       0       0       0.07231002554   0       0
-- 21   0       0       0       0.6013843825    0       0
-- 23   0       0       0       0.7720454793    0       0
-- 80   0       1       0       0               0       0
-- 104  0       0       0       0.3057717456    0       0
-- 144  0       0       0       0.1430937996    0       0.2646439667
-- 145  0       0       0       0.8276574       0       0
-- 155  0       1       0       0.8977280575    0       0
-- 156  0       0       0       0.8453629338    0       0
-- 158  0       0       0       0.5221399019    0       0
Parfait
  • 104,375
  • 17
  • 94
  • 125
0

I was able to answer my question by doing the following in SQL Server 2016:

First I dumped my posted data (which was the result of a stored proc) into a temp table:

DROP TABLE IF EXISTS #InputDataFrame
CREATE TABLE #InputDataFrame ( ID int, Code varchar(25), Condition varchar(25), WT float)
INSERT INTO #InputDataFrame
exec dbo.[Stored_Proc_to_Create_Sample_Data]   -- stored proc to create my posted data

Then I was able to recreate the desired transformation as follows:

DROP TABLE IF EXISTS #DistinctIDs
CREATE TABLE #DistinctIDs ( ID int, Code varchar(25) )
INSERT INTO #DistinctIDs ( ID, Code)
Select Distinct
I.ID,
I.Code
From #InputDataFrame I

Select
D.*,
CASE WHEN CON1.WT IS NULL THEN 0 ELSE CON1.WT END as CON1,
CASE WHEN CON2.WT IS NULL THEN 0 ELSE CON2.WT END as CON2,
CASE WHEN CON3.WT IS NULL THEN 0 ELSE CON3.WT END as CON3,
CASE WHEN CON4.WT IS NULL THEN 0 ELSE CON4.WT END as CON4,
CASE WHEN CON5.WT IS NULL THEN 0 ELSE CON5.WT END as CON5,
CASE WHEN CON6.WT IS NULL THEN 0 ELSE CON6.WT END as CON6

From #DistinctIDs D

Left Join ( Select I.ID, MAX( I.WT) as WT From #InputDataFrame I Where I.Condition = 'CON1' Group By I.ID) CON1 on CON1.ID = D.ID
Left Join ( Select I.ID, MAX( I.WT) as WT From #InputDataFrame I Where I.Condition = 'CON2' Group By I.ID) CON2 on CON2.ID = D.ID
Left Join ( Select I.ID, MAX( I.WT) as WT From #InputDataFrame I Where I.Condition = 'CON3' Group By I.ID) CON3 on CON3.ID = D.ID
Left Join ( Select I.ID, MAX( I.WT) as WT From #InputDataFrame I Where I.Condition = 'CON4' Group By I.ID) CON4 on CON4.ID = D.ID
Left Join ( Select I.ID, MAX( I.WT) as WT From #InputDataFrame I Where I.Condition = 'CON5' Group By I.ID) CON5 on CON5.ID = D.ID
Left Join ( Select I.ID, MAX( I.WT) as WT From #InputDataFrame I Where I.Condition = 'CON6' Group By I.ID) CON6 on CON6.ID = D.ID

I doubt this is the most elegant solution, but it worked for my purposes and hopefully it will be useful to anyone else who may be struggling to imitate R's dcast() functionality in a SQL setting.

Rick Arko
  • 680
  • 1
  • 8
  • 27