Issues reading data as csv in R

Question

I have a large data set of (~20000x1). Not all the fields are filled, in other words the data does have missing values. Each feature is a string.

I have done the following code runs:

Input:

data <- read.csv("data.csv", header=TRUE, quote = "")
datan <- read.table("data.csv", header = TRUE, fill = TRUE)

Output for the second code:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 80 elements

Input:

datar <- read.csv("data.csv", header = TRUE, na.strings = NA)

Output:

Warning message: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string

I run into essentially 4 problems, that I see. Two of the problems are the error message stated above. The third one is if it doesn't spit out an error message, when I look at the global environment window, I see not all my rows are accounted for, like ~14000 samples are missing but the feature number is right. The other problem I see is, again, not all the samples are counted for and the feature number is not correct.

How can I solve this??

Generally this means that you don't fully understand the format that your file is in. Somewhere there's a unusual character, an unmatched quote, a field that contains a comma, etc. But there's no way for _us_ to figure that out, because we don't have your file. — joran, Feb 21 '18 at 22:11
No, but does it matter if the inputs have like periods at the ends? An example of one would be "#DogRules!!! I am feeling happy to see dogs." — Jayganesh Kalla, Feb 21 '18 at 22:12
You could also try using `comment.char = ""`. That should help when you have a pound sign. — desc, Feb 21 '18 at 22:26
Try to disable quoting like `datar <- read.csv("data.csv", quote = "", row.names = NULL, stringsAsFactors = FALSE)` — Aleh, Feb 21 '18 at 22:42
Best go to the `bash` or `dos` command line for a moment depending on your OS. Type `head -3 data.csv` and have look at it. If you are still unsure then post this example to your question. Otherwise this is a how long is my piece of string question. — Stephen Henderson, Feb 21 '18 at 23:07

score 0 · Answer 1 · answered Feb 21 '18 at 22:28

0

Try the argument comment.char = "" as well as quote. The hash (#) is being read by R as a comment and will cut the line short.

answered Feb 21 '18 at 22:28

David Foster

447
4
16

score 0 · Answer 2 · answered Feb 23 '18 at 01:55

Can you open the CSV using Notepad++? This will allow you to see 'invisible' characters and any other non-printable characters. That file may not contain what you think it contains! When you get the sourcing issue resolved, you can choose the CSV file with a selector tool.

filename <- file.choose()
data <- read.csv(filename, skip=1)
name <- basename(filename)

Or, hard-code the path, and read the data into R.

# Read CSV into R
MyData <- read.csv(file="c:/your_path_here/Data.csv", header=TRUE, sep=",")

Issues reading data as csv in R

2 Answers2