2

I have a data frame like this:

enter image description here

Created with this:

companies = c("ABC Ltd", "ABC Ltd", "ABC Ltd", "Derwent plc", "Derwent plc")
sic = c("12345", "24155", "31231", "55346", "34234")

df = data.frame(companies, sic)


As you can see, the companies column is duplicated due to the SIC code.

I want to pivot wider so that each SIC code has its own column and that it is only 1 company per row.

Something like the following where I don't know how many columns there might be (i.e. there could be some companies with 20 sic codes).

enter image description here

I have tried pivoting it using pivot_wider but I cannot get it to do what I need it to do.

Any help is much appreciated.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
Nicholas
  • 3,517
  • 13
  • 47
  • 86

3 Answers3

4

With the packages dplyr and tidyr you can use

library(dplyr)
library(tidyr)

df %>% 
  group_by(companies) %>% 
  mutate(row_n = row_number()) %>% 
  pivot_wider(companies, names_from = row_n, values_from = sic, names_glue = "sic.{row_n}")

Output

# A tibble: 2 x 4
# Groups:   companies [2]
#   companies   sic.1 sic.2 sic.3
#   <chr>       <chr> <chr> <chr>
# 1 ABC Ltd     12345 24155 31231
# 2 Derwent plc 55346 34234 NA   
Ric S
  • 9,073
  • 3
  • 25
  • 51
  • Perfect. Just what I needed. Thank you so much 'Ric S' for your quick response :)... that 'names_glue' is really nifty! – Nicholas Jul 14 '20 at 10:10
  • 1
    Yeah it is a nice functionality that helps you renaming the columns as you like without renaming them in an explicit step – Ric S Jul 14 '20 at 10:11
2

You can split sic by companies, call [ with 1:max(lengths(x)) and rbind the result.

x <- split(df$sic, df$companies)
do.call(rbind, lapply(x, "[", 1:max(lengths(x))))
#            [,1]    [,2]    [,3]   
#ABC Ltd     "12345" "24155" "31231"
#Derwent plc "55346" "34234" NA     
GKi
  • 37,245
  • 2
  • 26
  • 48
1

You have problems because there's not yet a "time" variable that differentiates the measures for each ID. You could use ave to make one and use reshape.

res <- reshape(transform(df, t=ave(companies, companies, FUN=seq)), 
        idvar="companies", timevar="t", direction="wide")
res
#     companies sic.1 sic.2 sic.3
# 1     ABC Ltd 12345 24155 31231
# 4 Derwent plc 55346 34234  <NA>

However, you may want to reconsider your data on which measurements of the IDs correspond to each other!

jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • 1
    Thank you jay.sf, appreciate your quick response. I take your point. I have never seen 'ave' been used before, learnt a lot today! – Nicholas Jul 14 '20 at 10:16