So I have this tsv dataset made of 19,150,868 rows; I know for sure the number is correct because A) it was specified by the owner of the file and B) I checked using wc -l
in UNIX.
Yet, when I ran:
MyData = read.table("dataset.tsv", header=FALSE, sep="\t",
col.names = c_names, colClass = "character", comment.char = "",
quote="", nrows = 19150868)
Only the first 835873 got imported. No error is thrown, and the process only takes 20.33 seconds.