-1

I imported a big dataset (~6 million rows) to R using ffbase package that lists people enrolled in high school in Brazil. In principle, I have 2 columns: Id (Student Id Number) and University (Institution’s name).

I would like to create a column - named Group in my example - relating each university to its educational group:

Id           University       Group
000001       Anhanguera       Kroton
000002       Unopar           Kroton
000003       Anhembi          Laureate
000004       FMU              Laureate

PS: I have none information about educational groups in my dataset, but, I’ve got the information I need concerning which group corresponds to each university. In this way, I need to attach this detail to my data.

PS2: The class of University column is ff_vector.

I appreciate any contribution you might make.

phill
  • 95
  • 2
  • 10
  • *"I have none information about educational groups in my dataset, but, I’ve got the information I need concerning which group corresponds to each university. In this way, I need to attach this detail to my data."* Please add the information that maps groups to universities to the post, otherwise we have nothing to work with. This sounds like a normal merge/join of the two `data.frame`s should do the job. – Maurits Evers May 02 '19 at 23:31

1 Answers1

0

If you have a long list of Groups, this may not be the quickest way, but, using mutate from the dplyr package:

data <- data.frame("Id" = 000001:000004, "University" = c("Anhanguera", "Unopar", "Anhembi", "FMU"))

data <-  mutate(data, Group = as.factor(
    ifelse(University %in% "Anhanguera", "Kronton", 
        ifelse(University %in% "Unopar", "Kronton",
            ifelse(University %in% "Anhembi", "Laureate",
                ifelse(University %in% "FMU", "Laureate", NA))))))        
data
str(data)

I used University here, but just substitute it with ff_vector.

If you would like to keep Group as character, remove the as.factor().

I'm not familiar with ffbase, but see ffbase2 for using dplyr and ffbase.

Mohamed Benkedadra
  • 1,964
  • 3
  • 21
  • 48
Keta5
  • 1
  • 2