If you have a fixed sample size that you want to select and you do not know ahead of time how many rows the file has, then here is some sample code that will result in a simple random sample of the data without storing the whole dataset in memory:
n <- 1000
con <- file("jan08.csv", open = "r")
head <- readLines(con, 1)
sampdat <- readLines(con, n)
k <- n
while (length(curline <- readLines(con, 1))) {
k <- k + 1
if (runif(1) < n/k) {
sampdat[sample(n, 1)] <- curline
}
}
close(con)
delaysamp <- read.csv(textConnection(c(head, sampdat)))
If you are working with the large dataset more than just the once then it may be better to read the data into a database, then sample from there.
The ff package is another option for storing a large data object in a file, but being able to grab parts of it within R in a simple manner.