0

I have R code like this, that I cannot execute because I do not have rights to install the packages, so need some help understanding what it is doing.

raw_data<- read.csv("raw_data.csv")
attach(raw_data)
raw_data$new_col<- raw_data$Employee.Name
raw_data <- select(raw_data, - Employee.Name)

Am I correct that line 3 is creating a new field called new_col and assigning the value from the csv field Employee Name . The . is supposed to mask the space between Employee and Name

In the 4th line, we are just dropping the original column from the dataset?

Victor
  • 16,609
  • 71
  • 229
  • 409
  • yup, that's right – Pawel Jan 25 '18 at 20:53
  • 1
    There are no non-base packages used here. And there is no need for the `attach()` here (in general it should be avoided). It's unclear exactly what your problem is. – MrFlick Jan 25 '18 at 20:53
  • 2
    @MrFlick, `select()` is in tidyverse, not in base R. – Ben Bolker Jan 25 '18 at 20:55
  • @BenBolker Doh. I get it mixed up with `subset()`. Good point. But a very weird use of it in this case. These 4 lines of code blow my mind as to how bizarre they are. If you are going to use `dplyr/tidyverse`, why not use `mutate()` here. – MrFlick Jan 25 '18 at 20:56
  • 1
    You do not need to have admin rights to install r or rstudio and the r packages. If I am not misteaken you need to define a local installation folder. I have no admin rights either and I can execute R codes at my will. I had a problem with the library path that I solved this way: https://stackoverflow.com/a/42643674/2344958 – Marco Jan 25 '18 at 21:05

1 Answers1

2

Yes, the fourth line (raw_data <- select(raw_data, - Employee.Name)) is using the select() function from the dplyr package to drop a column/variable from the data set. The base R equivalents would be

subset(raw_data, select = -Employee.Name)

or

raw_data[,!(names(raw_data)=="Employee.Name")]

Almost every modern R lesson recommends avoiding attach() (even its own help page!)

The operation here creates a new column by copying the employee name column, then drops the employee name column. It might be more efficient and easier to understand to rename the column instead.

names(raw_data)[names(raw_data)=="Employee.Name"] <- "new_col"

or in tidyverse

rename(raw_data, new_col = Employee.Name)

(see here)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453