-1

I have a column in a database with this organization:

Example:

Location    
A_1
A_1
A_2
A_3
A_3
B_1
B_2

I want to group them by the first part ("A"), using R; that is, I want to create a new column based on the letter, so the database would look like this:

Location    Location_1
A_1         A
A_1         A
A_2         A
A_3         A
A_3         A
B_1         B
B_2         B

I already tried the mutate() and ifelse() functions following another post here (Create column with grouped values based on another column), but I get this error:

"Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "character""

Does someone know how to fix this problem or another method?

Here is part of the .csv file I am using:

 Location    Species    Time
    A_1         FC       0.52
    A_1         JC       0.64
    A_2         JC       0.31
    A_2         FC       0.02 
    A_2         FC       0.01
    A_3         FC       0.13
    A_3         JC       0.97
    A_3         OT       0.86
    A_3         JC       0.55
    B_1         JC       0.32
    B_1         OT       0.04
    B_1         OT       0.06
    B_2         OT       0.12
    B_2         JC       0.13
    B_2         JC       0.14
    B_2         OT       0.56
    C_1         OT       0.57
    C_1         OT       0.86
    C_1         FC       0.58
    C_1         FC       0.76
    ...         ...       ...
mto23
  • 303
  • 1
  • 5
  • 15

2 Answers2

1

You can use strsplit to split the first column by "_". This should do what you want:

dat <- data.frame(Location=c("A_1","A_1","A_2","A_3","A_3","B_1","B_2"),
   stringsAsFactors = FALSE)

dat$Location1 <- sapply(strsplit(dat$Location, "_"), "[[", 1)

dat

> dat
  Location Location1
1      A_1         A
2      A_1         A
3      A_2         A
4      A_3         A
5      A_3         A
6      B_1         B
7      B_2         B
Liza
  • 1,066
  • 2
  • 16
  • 26
1

There is a simple way using gsub or sub to get text before _. It can be achieved as:

#data
df <- data.frame(Location=c("A_1","A_1","A_2","A_3","A_3","B_1","B_2"), 
             State=c("S_1","S_1","S_2","T_3","T_3","T_1","T_2"),
             City=c("X_1","X_1","X_2","X_3","X_3","Y_1","Y_2"),
             stringsAsFactors = FALSE)
# single column
df$Location_1 <- gsub("_.*", "", df$Location, perl = TRUE)

df
#  Location Location_1
#1      A_1          A
#2      A_1          A
#3      A_2          A
#4      A_3          A
#5      A_3          A
#6      B_1          B
#7      B_2          B


# using mutate_at for multiple columns. Its applying on all columns
library(dplyr)
df %>% mutate_at(names(df), .funs = funs(new = gsub("_.*", "", ., perl = TRUE)))

#Result
#Location State City Location_new State_new City_new
#1      A_1   S_1  X_1            A         S        X
#2      A_1   S_1  X_1            A         S        X
#3      A_2   S_2  X_2            A         S        X
#4      A_3   T_3  X_3            A         T        X
#5      A_3   T_3  X_3            A         T        X
#6      B_1   T_1  Y_1            B         T        Y
#7      B_2   T_2  Y_2            B         T        Y

Option 3

Read from csv file:

df <- read.table("d:/Files/data.csv", header = TRUE, stringsAsFactors = FALSE) 

df$Location_1 <- gsub("_.*", "", df$Location, perl = TRUE)
MKR
  • 19,739
  • 4
  • 23
  • 33
  • Thank you! But since I have more records (around 300), is there a way to represent the "data.frame()" automatically? – mto23 Mar 02 '18 at 20:53
  • what format do you have your data in? Excel? Csv? R object? – Liza Mar 02 '18 at 20:54
  • @Teresa Yes. Do you have vector/data.frame? – MKR Mar 02 '18 at 20:55
  • @Teresa If you have many such column then you can use `mutate_at`. We can help if you can provide additional info. – MKR Mar 02 '18 at 20:58
  • @Liza yes, it is a csv file that I imported and have been working on in R. How can I apply the "mutate_at" in thi case? Thank you very much for the help! – mto23 Mar 02 '18 at 21:05
  • @Teresa Okay. Please provide few names of columns that you want to change and I can append answer. – MKR Mar 02 '18 at 21:08
  • 1
    @Teresa what's the object name for that csv in R. can you paste the code how you read it in – Liza Mar 02 '18 at 21:09
  • Maybe I didn't explain myself way (or I am not understanding what you are asking, sorry); I have only two columns I want to work with (Location and Location_1) but I do have more rows (around 300). Therefore, I wanted to know if there is a way to represent the "data.frame()" automatically so I don't have to write each one down? This is the error I got when I applied your code `Error in `$<-.data.frame`(`*tmp*`, loc, value = list(Location = c("SAM_C1", : replacement has 8 rows, data has 282` – mto23 Mar 02 '18 at 22:12
  • @Teresa Understood. I thought you have got many columns. Many rows are not any issue at all. Can you paste part of your `csv` as part of question so that I can modify my answer to match your need? – MKR Mar 02 '18 at 22:16
  • There it is, it goes on like that; for this part, the only issue I have is with the "location" column – mto23 Mar 02 '18 at 22:39
  • @Teresa Thanks. I will modify my answer in a while. – MKR Mar 02 '18 at 22:41
  • You just need to read in that csv into R like this: dat<-read.csv("C:/Desktop/FileName.csv") just change the location and filename to what you have – Liza Mar 02 '18 at 22:44
  • @Teresa I have added way to handle your csv file in my answer. Use those and it should work. You need to correct the path and name of `csv` file – MKR Mar 02 '18 at 22:51
  • @Liza Simple `read.csv` will not work in file from OP as there is no `,` deliminator in file data. Hence better to use `read.table`. – MKR Mar 02 '18 at 22:52
  • @Teresa happy to help. You can modify your question to make it clear that you want to read from `csv` and apply rules to make it readable for future user. – MKR Mar 03 '18 at 16:39