0

I have a df which has a column which is a factor when I read if from csv.

   Month_considered   pct ATC_Count 
   <fct>            <dbl> <fct>     
 1 Apr-17            54.9 198,337   
 2 May-17            56.4 227,681   
 3 Jun-17            58.0 251,664   
 4 Jul-17            57.7 251,934   
 5 Aug-17            55.5 259,617   
 6 Sep-17            55.7 245,588   
 7 Oct-17            56.6 247,051   
 8 Nov-17            57.6 256,375   
 9 Dec-17            56.9 277,784   
10 Jan-18            56.7 272,818   
11 2/1/18            59.1 266,277.00
> sapply(ab, class)
Month_considered              pct        ATC_Count 
        "factor"        "numeric"         "factor"

When I try to convert ATC_Count to integer I get the following output where ATC_Count shows different value. What might be wrong here.

ab$ATC_Count <- as.integer(ab$ATC_Count)

   Month_considered   pct ATC_Count
   <fct>            <dbl>     <int>
 1 Apr-17            54.9     36571
 2 May-17            56.4     37325
 3 Jun-17            58.0     37780
 4 Jul-17            57.7     37781
 5 Aug-17            55.5     37885
 6 Sep-17            55.7     37682
 7 Oct-17            56.6     37714
 8 Nov-17            57.6     37855
 9 Dec-17            56.9     38099
10 Jan-18            56.7     38060
11 2/1/18            59.1     37990
SNT
  • 1,283
  • 3
  • 32
  • 78
  • Use `as.integer(as.character(ab$ATC_Count)` BTW, there is a `,` in the `ATC_Count`, so you may need to remove it and convert it `as.integer(sub(",", "", ab$ATC_Count))` or using `tidyverse`, `ab %>% mutate(ATC_Count = as.integer(str_remove(ATC_Count)))` – akrun Jul 11 '18 at 02:55
  • use `as.numeric(gsub(',',"",ab$ATC_Count))` – Onyambu Jul 11 '18 at 02:56
  • I get NAs introduced by coercion when I do as.integer(as.character(ab$ATC_Count)) – SNT Jul 11 '18 at 02:57
  • 1
    Yes, because there is `,` Please read the full comment – akrun Jul 11 '18 at 02:58
  • Possible duplicate https://stackoverflow.com/questions/31944103/converting-currency-with-commas-into-numeric/ – Ronak Shah Jul 11 '18 at 03:06

1 Answers1

2

There is a , in the 'ATC_Count' which can be removed with sub

as.integer(sub(",", "", ab$ATC_Count))

Or using tidyverse

library(tidyverse)
ab %>% 
    mutate(ATC_Count = as.integer(str_remove(ATC_Count, ",")))

Or with parse_number from readr

ab %>%
    mutate(ATC_Count = parse_number(ATC_Count))

Regarding the different numbers while conversion of factor to integer, it is the integer storage values that we get. The usual way to convert is

as.integer(as.character(ab$ATC_Count))

which would not work here because there is , within the column values

akrun
  • 874,273
  • 37
  • 540
  • 662