-1

I have used both read.table (with arguments sep="\t", header = T, na.string = "NA") and the read_csv (with arguments col_names = T, na = "NA") from the reader package to read in a csv file. When I estimate a model, the summary shows vastly different results although the number of observations is the same. Now I don't know which of these two models is based on the correctly imported data. How can I go about debugging this?

Tea Tree
  • 882
  • 11
  • 26
  • 2
    Check the data imported to memory in R (the object) against the file on your hard drive. See which one is correct. Also why are you specifying tab separators for `read.table` but use `read_csv` from readr package? If you have tab-separated data use either `read.delim()` or `readr::read_delim()` – Karolis Koncevičius Mar 10 '18 at 22:44
  • Well `sep = "\t"` is vastly different from a csv, so differing results is not surprising. An example might help! – Rich Scriven Mar 10 '18 at 22:58

1 Answers1

0

Question: How to go about debugging an unexpected result after reading data into R.

Answer: The first step, even before you run into a problem, should be to look at the data. You'll develop your own workflow, but mine involves looking at it in a text editor to see if my assumptions about it stand. I might search for certain values in the text editor. Then I look at it in R with str(my_data), head(my_data), colSums(is.na(my_data)), View(my_data), and depending on the structure of it, summary(my_data), either for the entire data frame, or for subsets of it (depending on how many variables it has).

De Novo
  • 7,120
  • 1
  • 23
  • 39
  • Thanks! What actually happened was that in the original dataset the headers were shifted to the left. For some reason read.table was able to assign the headers correctly, while read_csv did not. – Tea Tree Mar 11 '18 at 00:16
  • Glad you were able to figure it out! – De Novo Mar 11 '18 at 00:24