1

I have the dataset that looks like this:


    ISIN        Variable    Value
    182         Sales2008   276
    182         Sales2009   800
    182         Sales2010   900
    182         Sales2011   273
    182         Sales2012   276
    182         Sales2013   123
    182         Sales2014   345
    182         Sales2015   444
    182         Sales2016   666
    182         Sales2017   777
    182         Sales2018   999
    182         Sales2019   235
    182         Salesf2008   276
    182         Salesf2009   800
    182         Salesf2010   900
    182         Salesf2011   273
    182         Salesf2012   276
    182         Salesf2013   123
    182         Salesf2014   345
    182         Salesf2015   444
    182         Salesf2016   666
    182         Salesf2017   777
    182         Salesf2018   999
    182         Salesf2019   235
    182         ROA2008      333
    182         ROA2009      333
    182         ROA2010      333
    182         ROA2011      333
    182         ROA2012      333
    182         ROA2013      333
    182         ROA2014      333
    182         ROA2015      333
    182         ROA2016      333
    182         ROA2017      333
    182         ROA2018      333
    182         ROA2019      333

Do you have any suggestion how to create a new df where the values from Sales2008 to Sales2019 would be stored in one column? Also, the values for Salesf2008 to Salesf2019 would be stored in the next column, and the values for ROA2008 to ROA2019 in the next column?

Thanks in advance!

Daniel O
  • 4,258
  • 6
  • 20
Jana L
  • 15
  • 4

1 Answers1

0

Hard to say without your expected output, but it sounds like you want this. This uses extract() to make two columns by extracting the 1) letters and 2) 4 digits, and pivot_wider() to go from long to wide using the new column: "Col" and your "Values" column. Hope it helps!!

library(tidyr)
library(dplyr) # for %>%

df1 %>%
  extract(Variable, c("Col", "Year"), "([A-Za-z]+)(\\d{4})", convert = TRUE) %>% 
  pivot_wider(names_from = Col, values_from = Value) 

# A tibble: 12 x 5
    ISIN  Year  Sales Salesf   ROA
   <int> <int> <int>  <int> <int>
 1   182 2008    276    276   333
 2   182 2009    800    800   333
 3   182 2010    900    900   333
 4   182 2011    273    273   333
 5   182 2012    276    276   333
 6   182 2013    123    123   333
 7   182 2014    345    345   333
 8   182 2015    444    444   333
 9   182 2016    666    666   333
10   182 2017    777    777   333
11   182 2018    999    999   333
12   182 2019    235    235   333

Data:

df1 <- read.table(header = TRUE, text = "ISIN        Variable    Value
182         Sales2008   276
182         Sales2009   800
182         Sales2010   900
182         Sales2011   273
182         Sales2012   276
182         Sales2013   123
182         Sales2014   345
182         Sales2015   444
182         Sales2016   666
182         Sales2017   777
182         Sales2018   999
182         Sales2019   235
182         Salesf2008   276
182         Salesf2009   800
182         Salesf2010   900
182         Salesf2011   273
182         Salesf2012   276
182         Salesf2013   123
182         Salesf2014   345
182         Salesf2015   444
182         Salesf2016   666
182         Salesf2017   777
182         Salesf2018   999
182         Salesf2019   235
182         ROA2008      333
182         ROA2009      333
182         ROA2010      333
182         ROA2011      333
182         ROA2012      333
182         ROA2013      333
182         ROA2014      333
182         ROA2015      333
182         ROA2016      333
182         ROA2017      333
182         ROA2018      333
182         ROA2019      333")
Andrew
  • 5,028
  • 2
  • 11
  • 21
  • It would be nicer to share the output of `dput(df1)` rather than an entire copy/paste of the the data in the question which takes half the page. – Daniel O Jul 07 '20 at 19:08
  • 1
    thank you very for your help! It worked. The output of dataframe is too big to share, so instead, i tried to take just a couple of lines just to show how the dataset is structured. – Jana L Jul 07 '20 at 19:32