1

I have a data.frame with 203 rows and 111 numeric variables. I first import my database. At this stage there is no problem, I get what is expected when I filter my data (i.e. 154 observations for Q5==1 and 11 obervations for Q6_6==1):

> cred.res = read_excel("C:/Users/whatever")
> filter(cred.res, Q6_6==1)
# A tibble: 11 x 111
> filter(cred.res, Q5==1)
# A tibble: 154 x 111

Then, as I want to implement a MCA, I need to convert some of my variables into factor:

cred.res = data.frame(apply(cred.res,2, as.factor))

num.col <- c("Q10_sum", "Q11_sum", "Q12_sum", "NB_DE_BI","NB_DE_B1", "POURC", "POP","E101", "TOTAL_EM", "E110", "ENFANTS_", "DEPENSES", "F314", "F501", "TOTAL_DE","Nbpartenaire")

cred.res[, num.col] = apply(cred.res[, num.col], 2, as.numeric)

That's where the trouble begins. Indeed, I still get my 154 observations when I filter on Q5 but it doesn't work for Q6_6 (I get 0 obs):

> de=filter(cred.res, Q5==1)
> str(de)
'data.frame':   154 obs. of  111 variables:

> se=filter(cred.res, Q6_6==1)
> str(se)
'data.frame':   0 obs. of  111 variables:

I tried to use the function as.numeric, but it still doesn't work, I now get 38 obs:

> ze=filter(cred.res, as.numeric(Q6_6)==1)
> str(ze)
'data.frame':   38 obs. of  111 variables:

But it works with the operator > 1:

> qe=filter(cred.res, as.numeric(Q6_6)>1)
> str(qe)
'data.frame':   11 obs. of  111 variables:

It seems that converting my variables into factors has changed the values. Can someone explains how does it happen ? Should I always apply filters before converting the variables ?

Hope I was understandable, I'm not a native English speaker. Thank you !

Aflatoun
  • 37
  • 1
  • 5
  • This could be a case of [floating-point issues](https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal), since it works with `>`. Can you please post the output of `dput(head(cred.res, 20))` **in the question**? – Rui Barradas Aug 16 '18 at 14:36
  • 1
    From `factor` straight to `numeric` it's not going to work. You need the `character` values first. Check this quick example: `x = factor(5:7); x; as.numeric(x); as.numeric(as.character(x))`. The key is what happens when you transform to `factor` initially. We need data for that. – AntoniosK Aug 16 '18 at 14:52
  • Well,you were right @AntoniosK. I wasn't aware of this "subtlety", thx you ! – Aflatoun Aug 16 '18 at 15:22
  • 1
    Also, don't use `apply` on the columns of a data frame - `apply` is built for matrices and it will convert to matrix. Use `lapply` instead. – Gregor Thomas Sep 02 '18 at 20:57

0 Answers0