I have one column in a CSV of salary data with the following types of data:
- Nothing in cell
- "£2.5-£3.5 an hour"
- "£4.8 an hour"
- "£40,000-£60,000 a year"
- £60,000 a year
My objective is to:
- Split this df$salary column into as many columns as possible (via delimiter
" "
) - For the rows where the column 5 is blank (the salary is fixed), take the first value of the split data frame and put that as a new column as a currency, multiplying by a set number based on if the 3rd column is "hour" (multiply by 8*262 to get annual salary) etc.
- For the rows where the column 5 is not blank (the salary is a range), take the mid range of numbers of the 2nd and 4th columns of the split data frame, add that to the same new column for salary, and annualise based on what column 5 of the split data frame is (year, month etc.)
- Append this new salary column to the old data frame with the full data.
So far I have data_split<- strsplit(as.character(data$salary), split=" ")
to split the column I was looking for, but I get this output:
> tail(data_split)
[[1]]
[1] "£26,000" "a" "year"
[[2]]
character(0)
[[3]]
[1] "£100" "a" "day"
[[4]]
[1] "£16,107" "a" "year"
[[5]]
[1] "£15,747" "a" "year"
[[6]]
[1] "£9.00" "-" "£15.50" "an" "hour"
Any idea what to do next? The difficulty vs. other answers I found on stack overflow is that I am manipulating based on an if condition on a column 5 of a data frame. I would have loved if I could access my R materials from this online course but they are blocked!!