0

I am trying to convert a large .CSV file into an .Xdf file by using the rxImport() function with the below code:

rxImport(inData = "/poc/revor/data/ext_roll36_chrg_vol.csv",
         outFile = "/poc/revor/data/ext_roll36_chrg_vol.xdf", 
         overwrite = TRUE, rowsPerRead = 100000,
         colClasses = c(SE_NO = "character", 
                        HIER_ROLLUP_CD = "character", 
                        CUR_MO_CT ="numeric", 
                        CUR_MO_AM = "numeric", 
                        AD_LINE_1_TX = "character",
                        AD_LINE_2_TX = "character",
                        SUBMIT_DT = "character", 
                        UPDT_TS = "character"),
         transforms = list(SUBMIT_DT = as.Date(SUBMIT_DT, format="%d%b%Y")))

But this file contains many records like:

0200001097,SS,625,236899.000,"KRAV MAGA WORLDWIDE, INC.","KRAV MAGA WORLDWIDE, INC.",01MAY2014,07JUN2014:01:08:57.000000

As you can see the columns AD_LINE_1_TX & AD_LINE_2_TX contain commas inside the double quotes.

I have tried using the type = "text" argument, but then it reads the first column i.e SE_NO as numeric even though its type is showing as character. This is the issue with all the numeric fields which I want to read as a character.

And If I transform the column using the transform argument into character as:

transforms = list(SE_NO = as.character(as.numeric(SE_NO)))

Then the value of SE_NO column changes from 0200001097 to 0200001000 in the transformation from character(exponential representation) 2.000011e+08 to numeric.

So Is there any other way to suppress the comma inside the double quotes with out affecting other columns?

Please let me know in case any further information is needed on the same.

jcdyer
  • 18,616
  • 5
  • 42
  • 49
Amit Mishra
  • 33
  • 1
  • 10

1 Answers1

0

This should give you what you need...

input_file <- "/poc/revor/data/ext_roll36_chrg_vol.csv"
output_file <- "/poc/revor/data/ext_roll36_chrg_vol.xdf"

my_colInfo <- list(list(index = 1, type = "character", newName = "SE_NO"),
                   list(index = 2, type = "character", newName = "HIER_ROLLUP_CD"),
                   list(index = 3, type = "numeric", newName = "CUR_MO_CT"),
                   list(index = 4, type = "numeric", newName = "CUR_MO_AM"),
                   list(index = 5, type = "character", newName = "AD_LINE_1_TX"),
                   list(index = 6, type = "character", newName = "AD_LINE_2_TX"),
                   list(index = 7, type = "character", newName = "SUBMIT_DT"),
                   list(index = 8, type = "character", newName = "UPDT_TS"))

input_source <- RxTextData(file = input_file, 
                           colInfo = my_colInfo,
                           delimiter = ",",
                           quotedDelimiters = TRUE,
                           useFastRead = TRUE)

rxImport(inData = input_source,
         outFile = output_file, 
         overwrite = TRUE, rowsPerRead = 100000,
         transforms = list(SUBMIT_DT = as.Date(SUBMIT_DT, format="%d%b%Y")))
  • Thanks a lot for the solution, It does suppress the quotes however it is showing a error again as : Error in rxCall("Rx_ImportDataSource", params) : may be due to value of column "AD_LINE_1_TX" having value as """KIMBEL PLUMBING, INC.""" i.e having three quotes before & after the value. So is there any way to remove the extra quotes present entirely in this column? – Amit Mishra Jul 13 '15 at 10:25