1

Below is the sample data. Pct10,pct25,median,and pct75 are hourly wages. Their counterparts below are annual salaries. The quandary is how to basically fold only these below while duplicating the descriptive items; industry and occupation. The desired result is below. Ratetype would be a new field/column with 1 representing hourly and 4 being annual. My previous attempts have resulted in far too many rows being created. In this case, it would 4 original * 2 for an end result of 8.

First, is melt the best way to go about this?

Second, is it as simple as occmelt2<- melt(occmelt, id.vars = c()) and simply specifying the correct Id variables

industry<-c(000000,000000,000000,000000)
occupation<-c(110000,120000,130000,140000)
pct10 <-c(11,15,26.25,19.50)
pct25 <-c(13,14.50,15.65,15.95)
median <-c(22,22.50,23.50,29)
pct75<-c(24,23.75,22,26.50)
Annualpct10 <-c(44000,45000,46000,49000)
Annualpct25 <-c(48000,49000,52000,54000)
Annualmedian <-c(54000,55000,56700,59800)
Annualpct75 <-c(64000,65000,71000,72700)

occmelt <- data.frame(industry,occupation,pct10,pct25,median,pct75,Annualpct10,Annualpct25,Annualmedian,Annualpct75)

industry   occupation    rateype    pct10   pct25   median   pct75
000000      110000          1         11      13     22        24 
000000      110000          4        44000   48000   54000     54000 
000000      120000          1         15     14.50   22.5      23.75
000000      120000          4        45000   49000   55000     65000    
Tim Wilcox
  • 1,275
  • 2
  • 19
  • 43

1 Answers1

2

Below is an option with pivot_longer

  1. Rename the columns that starts with 'pct' or 'median' to add a prefix 'Hourly'
  2. Use pivot_longer with names_sep to split between a character and the 'pct' or 'median'
  3. Then, recode the 'Hourly', 'Annual' to 1 and 4 respectively
library(dplyr)
library(tidyr)
library(stringr)
occmelt %>% 
   rename_with(~ str_c('Hourly', .), matches('^(pct|median)')) %>%
   pivot_longer(cols = matches('pct|median'), 
      names_to = c('ratetype', '.value'), 
            names_sep = "(?<=.)(?=(pct|median))") %>%
    mutate(ratetype = recode(ratetype, Hourly = "1", Annual = "4"))

-output

# A tibble: 8 x 7
  industry occupation ratetype   pct10   pct25  median   pct75
     <dbl>      <dbl> <chr>      <dbl>   <dbl>   <dbl>   <dbl>
1        0     110000 1           11      13      22      24  
2        0     110000 4        44000   48000   54000   64000  
3        0     120000 1           15      14.5    22.5    23.8
4        0     120000 4        45000   49000   55000   65000  
5        0     130000 1           26.2    15.6    23.5    22  
6        0     130000 4        46000   52000   56700   71000  
7        0     140000 1           19.5    16.0    29      26.5
8        0     140000 4        49000   54000   59800   72700  
akrun
  • 874,273
  • 37
  • 540
  • 662
  • this worked great. However, what I would do if I was to add more columns beyond the annualpct75? – Tim Wilcox Jun 24 '21 at 23:25
  • 1
    @TimWilcox In the code, I am matching columns that starts with `pct` or `median` by renaming it. In the pivot_longer, only those columns that have pct, median are used. If the column names are different, may have to modify the pattern – akrun Jun 24 '21 at 23:26