0


Can anyone help me with this simple task.
I'm getting started in r, and I can't understand why this part of code isn't running inside a loop for.
I've tried to use the function strsplit() out of the loop and it worked well, but I'm not succeding in running the code the way I would like to, inside a for loop, in a data frame.

Here is the code:

mpg <- ggplot2::mpg
df_sort <- data.frame(uni_model = sort(unique(mpg$model)))
df_sort$model1 <- ''
for (x in seq_along(df_sort$model)){
    df_sort[x, 'model1'] <- strsplit(df_sort[x, 'model'], ' ')    
}
Phil
  • 7,287
  • 3
  • 36
  • 66
  • Where does the `mpg` data frame come from? Also, what's the reason for insisting to use it in a for loop if you don't need to? – Phil May 14 '20 at 17:11
  • @Phil, The "mpg" dataset comes with ggplot2 package. I'm just stundying a little the ggplot2 library to use it in my job. If there is another way to do this you could teach, couldn't you? As I said, I'm getting started in r. The first programming language that I studied was python. However,cI'm a forestry engeneer, therefor I'm not very familiar with these knowledges in programming, but I'm managing myself somehow! – caio.valente14 May 14 '20 at 17:42
  • @Onyambu, the result of ```seq_along(df_sort$model)``` is a sequence of the rows indexes. I don't know if I explained very well, but it returns the numbers ```1:length(df_sort$model)```. I thought I could use these numbers to iterate each row. Why can't I? – caio.valente14 May 14 '20 at 17:50
  • I see. sorry I thought you had `seq_along(df_sort$model1)` will delete my comment. You need to node though that the result of strsplit is a list. since strsplit is `vectorized`, just do `df_sort$model1 <- strsplit(df_sort$model,' ')` – Onyambu May 14 '20 at 17:54
  • @Onyambu , Thanks a lot for answering my question. But I'm trying to use only the first word in the string. How can I do that? I tried this code ```mpg$model1 <- strsplit(mpg$model, ' ')```, and it creates a column with the strings ```c('a', 'b', 'c'...)```, then ```mpg$model2 <- mpg$model1[1]```, but in the column "model2" i only get the first word of the first observation. How can solve this? – caio.valente14 May 14 '20 at 18:10
  • if you are trying to use the first word in the string, then do `df$model1 <- sub(" .*","",df$model)` or in your for loop, do `df_sort[x, 'model1'] <- unlist(strsplit(df_sort[x, 'model'], ' '))[1]` – Onyambu May 14 '20 at 19:20

1 Answers1

1

I would suggest learning more about the tidyverse, as it provides a nice framework to learn and apply R tools without having to deal with idiosyncracies of base R. The following code does what you wish using dplyr and stringr for string manipulation:

library(dplyr)
library(stringr)

mutate(df_sort, model1 = word(uni_model, 1))

                uni_model      model1
1             4runner 4wd     4runner
2                      a4          a4
3              a4 quattro          a4
4              a6 quattro          a6
5                  altima      altima

etc...

Note that you don't need to use a for loop because R vectorizes by default. That is, any function you apply to a vector will by default be applied to each element of that vector.

Using base R, borrowing from here:

df_sort$model1 <- sapply(strsplit(df_sort$uni_model, "\\s"), `[`, 1)
Phil
  • 7,287
  • 3
  • 36
  • 66
  • thanks a lot, this solved my question, but there is another way of doing this without using libraries? As I said, getting started in r and I want to learn as much as possible with you, guys, who are more experient. – caio.valente14 May 14 '20 at 18:19
  • Provided a base R version. I admit not being sure why one would be intent on not using packages while being ok to use `ggplot2`. R packages are actually what makes R a great tool to use for data analysis. – Phil May 14 '20 at 18:28
  • I see, @Phil , and I agree with you. I'm not saying that I'm not using any library but ggplot2, I'm just curious about how can I do it without libraries. Thanks again! I appreciate the time you spent on aswering the question. – caio.valente14 May 14 '20 at 18:36