Read.table vs. read_csv: model are divergent

Question

I have used both read.table (with arguments sep="\t", header = T, na.string = "NA") and the read_csv (with arguments col_names = T, na = "NA") from the reader package to read in a csv file. When I estimate a model, the summary shows vastly different results although the number of observations is the same. Now I don't know which of these two models is based on the correctly imported data. How can I go about debugging this?

Check the data imported to memory in R (the object) against the file on your hard drive. See which one is correct. Also why are you specifying tab separators for `read.table` but use `read_csv` from readr package? If you have tab-separated data use either `read.delim()` or `readr::read_delim()` — Karolis Koncevičius, Mar 10 '18 at 22:44
Well `sep = "\t"` is vastly different from a csv, so differing results is not surprising. An example might help! — Rich Scriven, Mar 10 '18 at 22:58

score 0 · Accepted Answer · answered Mar 10 '18 at 22:50

0

Question: How to go about debugging an unexpected result after reading data into R.

Answer: The first step, even before you run into a problem, should be to look at the data. You'll develop your own workflow, but mine involves looking at it in a text editor to see if my assumptions about it stand. I might search for certain values in the text editor. Then I look at it in R with str(my_data), head(my_data), colSums(is.na(my_data)), View(my_data), and depending on the structure of it, summary(my_data), either for the entire data frame, or for subsets of it (depending on how many variables it has).

answered Mar 10 '18 at 22:50

De Novo

7,120
1
23
39

Thanks! What actually happened was that in the original dataset the headers were shifted to the left. For some reason read.table was able to assign the headers correctly, while read_csv did not. – Tea Tree Mar 11 '18 at 00:16
Glad you were able to figure it out! – De Novo Mar 11 '18 at 00:24

Read.table vs. read_csv: model are divergent

1 Answers1