Read numeric input as string R

Question

So, i have this input csv of the form,

id,No.,V,S,D
1,0100000109,623,233,331
2,0200000109,515,413,314
3,0600000109,611,266,662

I need to read the No. Column as it is(i.e., as a character). I know i can use something like this for that:

data <- read.csv("input.csv", colClasses = c("MSISDN" = "character"))

I have a code that i'm using to read the csv file in chunks:

chunk_size <- 2
con  <- file("input.csv", open = "r")
data_frame <- read.csv(con,nrows = chunk_size,colClasses = c("MSISDN" =   "character"),quote="",header = TRUE,)
header <- names(data_frame)
print(header)
print(data_frame)
if(nrow(data_frame) == chunk_size) {
repeat {
data_frame <- read.csv(con,nrows = chunk_size, header = FALSE, quote="")
names(data_frame)<-c(header)
print(header)
print(data_frame)
if(nrow(data_frame) < chunk_size) {
  break
}
}
}

close(con)

But, here what the issue i'm facing is that, the first chunk will only read the No. Column as a character, the rest of the chunks will not.

How can i resolve this?

PS: the original input file has about 150+ columns and about 20 Million rows.

Your final `read.csv` does not use `colClasses` like the other two. — Remko Duursma, Feb 10 '17 at 09:17
@Remko in the final read.csv i cant add colClasses because i've set header=false in that statement. — Raymond, Feb 10 '17 at 10:22
One straight forward solution would be using `readLines` to read the file as string and `split` to get the cols ... — holzben, Feb 10 '17 at 12:34

phileas · Answer 1 · 2017-02-13T08:07:51.013

0

You need to give the column type colClasses in the read.csv() inside the repeat procedure. You no longer have the header so you need to define an unnamed vector to specify the colClasses. Let's say the size of colClasses is 150.

myColClasses=rep("numeric",150) myColClasses[2] <- "character" repeat { data_frame <- read.csv(con,nrows = chunk_size, colClasses=myColClasses, header = FALSE, quote="") ...

edited Feb 13 '17 at 08:07

answered Feb 10 '17 at 09:37

phileas

830
4
11

the input i've provided is just a sample. The original file contains about 150+ columns. It would be very difficult to employ your solution then. Is there any alternate way? – Raymond Feb 10 '17 at 10:25

score 0 · Answer 2 · answered Feb 10 '17 at 19:24

0

You can read the data as string with readLines and split it:

fileName <- "input.csv"
df <- do.call(rbind.data.frame, strsplit(readLines(fileName), ",")[-1]) # skipping headlines
colnames(df) <- c("id","No.","V","S","D") #adding headlines

or the direct approach with read.csv:

fileName <- "input.csv"
col <- c("integer","character","integer","integer","integer")
df <- read.csv(file = fileName,
               sep = ",", 
               colClasses=col, 
               header = TRUE, 
               stringsAsFactors = FALSE)

answered Feb 10 '17 at 19:24

holzben

1,459
16
24

As i've already mentioned in the question, the input file contains about 150+ columns and manually putting datatypes for all the column headers is very difficult. – Raymond Feb 13 '17 at 07:13
in my first code snippet you don't need to to that. The correction of the column names (third line) can be done automatically as well eg. use readLine – holzben Feb 13 '17 at 07:25

Read numeric input as string R

2 Answers2

Linked