I have used both read.table (with arguments sep="\t", header = T, na.string = "NA") and the read_csv (with arguments col_names = T, na = "NA") from the reader package to read in a csv file. When I estimate a model, the summary shows vastly different results although the number of observations is the same. Now I don't know which of these two models is based on the correctly imported data. How can I go about debugging this?
Asked
Active
Viewed 352 times
-1
-
2Check the data imported to memory in R (the object) against the file on your hard drive. See which one is correct. Also why are you specifying tab separators for `read.table` but use `read_csv` from readr package? If you have tab-separated data use either `read.delim()` or `readr::read_delim()` – Karolis Koncevičius Mar 10 '18 at 22:44
-
Well `sep = "\t"` is vastly different from a csv, so differing results is not surprising. An example might help! – Rich Scriven Mar 10 '18 at 22:58
1 Answers
0
Question: How to go about debugging an unexpected result after reading data into R.
Answer: The first step, even before you run into a problem, should be to look at the data. You'll develop your own workflow, but mine involves looking at it in a text editor to see if my assumptions about it stand. I might search for certain values in the text editor. Then I look at it in R with str(my_data)
, head(my_data)
, colSums(is.na(my_data))
, View(my_data)
, and depending on the structure of it, summary(my_data)
, either for the entire data frame, or for subsets of it (depending on how many variables it has).

De Novo
- 7,120
- 1
- 23
- 39
-
Thanks! What actually happened was that in the original dataset the headers were shifted to the left. For some reason read.table was able to assign the headers correctly, while read_csv did not. – Tea Tree Mar 11 '18 at 00:16
-