1

I am looking for swift way of reshaping the data from long to wide format. Right now I have tried a code with nest for loops, though the job gets done, it takes a long time to generate the output.

SN NN EE Service_tier
A  B  C  economy
B  C  C  economy
P  Q  R  regular
Q  S  R  regular
S  R  R  regular
H  I  L  economy
I  J  L  economy
J  K  L  economy
K  L  L  economy

The output expected is as below

SN hop1 hop2 hop3 hop4 service_tier
A   B    C              economy
P   Q    S    R         regular
H   I    J    K    L    economy

currently the below code gets the job done. Am sure there is an effective and clean way to do this.

for (i in 1:lasrow){
  sn <- raw_d[i,1]
  nn <- raw_d[i,2]
  en <- raw_d[i,3]

  lc <- 1

  if(nn == en){
    d[lr,lc]<-sn
    d[lr,lc+1]<-nn
    d[lr,lc+2]<-en
    lr <- lr+1
  }
  else{
    while(nn!=en){
      d[lr,lc]<-sn
      lc <- lc+1
      next_d <- filter(raw_d,raw_d$SN==sn,raw_d$EN==en)
      if(dim(next_d)[1]==0){
        d[lr,lc]<-"broken bf"
        lc <- lc+1
        break
      }else{
        sn <- next_d$NN
        nn <- next_d$NN
        }
      }
    d[lr,lc]<-en
    lr<-lr+1
  }
}
Vasuki Rao
  • 37
  • 3
  • It seems to stand that your sets are mutually exclusive, for example, regular = {P, Q, R, S} but P,Q,R, and S are part of no other "set". Will this always be true? Could "A" ever be part of regular and economy? The proper method to widen this data will depend on your answer. – Marian Minar May 30 '19 at 04:20
  • economy = {P,Q,R,S} can appear in economy as well but for a give P and S we will have only one path not multiple paths. – Vasuki Rao May 30 '19 at 05:47

3 Answers3

0

One option is to create a unique sequence using rleid from data.table, gather the dataframe to long format, remove the duplicates from each group, assign column names and spread it back to wide format.

library(dplyr)
library(tidyr)

df %>%
  mutate(row = data.table::rleid(Service_tier)) %>%
  gather(key, value, -Service_tier, -row) %>%
  group_by(row) %>%
  filter(!duplicated(value)) %>%
  mutate(key = c("SN", paste0("hop", 1:(n() - 1)))) %>%
  spread(key, value) %>%
  ungroup() %>%
  select(-row) %>%
  select(SN, starts_with("hop"), Service_tier)

# A tibble: 3 x 6
#  SN    hop1  hop2  hop3  hop4  Service_tier
#  <chr> <chr> <chr> <chr> <chr> <fct>       
#1  A     B     C     NA    NA    economy     
#2  H     I     J     K     L     economy     
#3  P     Q     S     R     NA    regular    
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

We can use data.table. Convert the 'data.frame' to 'dat.table' (setDT(df1), grouped by rleid on the 'Service_tier', change the value of 'SN' to first element grouped by 'grp', then grouped by 'Service_tier', 'SN', get the unique element of Subset of Data.table and dcast from 'long' to 'wide' format

library(data.table)
dcast(setDT(df1)[, SN := first(SN), rleid(Service_tier)][, 
    unique(unlist(.SD)), .(SN, Service_tier)], 
    SN + Service_tier ~ paste0("hop", rowid(SN)), value.var = "V1", fill = "")
#  SN Service_tier hop1 hop2 hop3 hop4
#1:  A      economy    B    C          
#2:  H      economy    I    J    K    L
#3:  P      regular    Q    S    R     

data

df1 <- structure(list(SN = c("A", "B", "P", "Q", "S", "H", "I", "J", 
"K"), NN = c("B", "C", "Q", "S", "R", "I", "J", "K", "L"), EE = c("C", 
"C", "R", "R", "R", "L", "L", "L", "L"), Service_tier = c("economy", 
"economy", "regular", "regular", "regular", "economy", "economy", 
"economy", "economy")), class = "data.frame", row.names = c(NA, 
-9L))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

The crucial point here is to identify which rows belong to which group. The answers by Ronak and akrun both use rleid(Service_tier) assuming that a change in Service_tier indicate the begin of a new group.

This might be suggested by the sample dataset but cannot be taken as guaranteed. IMHO, Service_tier is rather an attribute than a key. As a matter of fact, the OP is testing for NN == EE in his code snippet to switch to a new group.

In the data.table solutions below, grouping is determined by cumsum(shift(NN == EE, fill = TRUE)) which tests for equality fo NN and EE, lags the result to the next row where the next group starts, and enumerates the groups by counting TRUE using cumsum().

In the simplified version (without reshaping), the hops are aggregated by the toString() function:

library(data.table)
setDT(d)[, .(SN = first(SN), hops = toString(NN), Service_tier = first(Service_tier)), 
  by = .(grp = cumsum(shift(NN == EE, fill = TRUE)))][]
   grp SN       hops Service_tier
1:   1  A       B, C      economy
2:   2  P    Q, S, R      regular
3:   3  H I, J, K, L      economy

For reshaping from long to wide format, dcast() is used:

library(data.table)
library(magrittr)   # piping used to improve readability
w <- setDT(d)[, .(SN = first(SN), hops = NN, Service_tier = first(Service_tier)), 
  by = .(grp = cumsum(shift(NN == EE, fill = TRUE)))] %>% 
  dcast(grp + ... ~ rowid(grp, prefix = "hop"), value.var = "hops", fill = "")  %>%  
  setcolorder(c(1:2, 4:ncol(.), 3))

w
   grp SN hop1 hop2 hop3 hop4 Service_tier
1:   1  A    B    C                economy
2:   2  P    Q    S    R           regular
3:   3  H    I    J    K    L      economy

setcolorder() is used to rearrange columns in the order expected by the OP. This is done in-place, i.e., without copying the whole data object.

Data

library(data.table)

d <- fread("SN NN EE Service_tier
A  B  C  economy
B  C  C  economy
P  Q  R  regular
Q  S  R  regular
S  R  R  regular
H  I  L  economy
I  J  L  economy
J  K  L  economy
K  L  L  economy")
Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
  • I am getting an error...Error in `[.data.frame`(x, , .(SN = first(SN), hops = toString(NN), Service_tier = first(Service_tier)), : unused argument (by = .(grp = cumsum(shift(NN == EE, fill = TRUE)))) – Vasuki Rao May 31 '19 at 10:51
  • @VasukiRao It seems I might have forgotten to coerce `d` to data.table using `setDT(d)`. I have amended the code. Please, tray again. And thank you for reporting. – Uwe May 31 '19 at 10:54
  • when i run the code, Error in ncol(w) : object 'w' not found – Vasuki Rao Jun 02 '19 at 14:29
  • My bad. Another copy & paste error. Please, see amended code. – Uwe Jun 02 '19 at 14:32
  • great, works flawlessly, every interesting solution compared to for loop – Vasuki Rao Jun 05 '19 at 02:44