-2

I have a data set with about 75,000 observations, which I would like to prepare a little bit in the first step.

For example I want to set a variable under a certain condition.

My classical approach now would be to iterate over the complete data set line by line. Check the condition in each line and then set the variable.

Is this the right approach especially with regard to the computing time?

INITIAL DATA Initial data

for (row in 1:nrow(kader_test)) {
  if (kader_test[row,]$saison <= kader_test[row,]$jahr_im_team_seit) {
    kader_test[row,]$gespielt_von = kader_test[row,]$im_team_seit
  }
}

Nach der FOR Schleife sieht man, dass sich in Zeile 1 und 6 etwas geändert hat. Gibt es hierfür einen eleganteren Weg?

RESULT Result

Thank you.

edstrinova
  • 101
  • 1
  • 7
  • 1
    Hi edstrinova. You haven't shown us your data. You have showed _pictures_ of your data, and we can't use these to test solutions. Could you please edit your question with the results of `dput(kader_test[1:11,])` ? Thanks – Allan Cameron Nov 01 '20 at 15:13
  • this could be achieved using `baseR` => `cond<-kader_test$saison <= kader_test$jahr_im_team_seit` `kader_test$gespielt_von[cond] = kader_test$im_team_seit[cond]` – Abdessabour Mtk Nov 01 '20 at 15:25
  • You probably need to be more careful about your comparison. You could either extract the years component of `im_team_seit` and compare numerically, *or* make the `saison` variable into a date (not sure whether this should be the first of the last day of the year ...) – Ben Bolker Nov 01 '20 at 15:55

3 Answers3

2

Because R is vectorized, you can use a simple base R ifelse statement.

kader_test$gespielt_von <- ifelse(kader_test$saison <= kader_test$jahr_im_team_seit, kader_test$im_team_seit, NA)
SteveM
  • 2,226
  • 3
  • 12
  • 16
1

I guess a good solution would be the dplyr package:

library(dplyr)

kader_test %>%
  dplyr::mutate(gespielt_von = ifelse(saison <= jahr_im_team_seit, im_team_seit, NA))
DPH
  • 4,244
  • 1
  • 8
  • 18
-1

Sorry that I did not show the data. I will do better next time.

First of all, thank you for your answers, it actually worked very well. The calculations are done within milliseconds.

Now I still have a few more hurdles ahead. Here we go :)

edstrinova
  • 101
  • 1
  • 7