-1

I have a 6 gb csv file that I am trying to read into R using read.csv.sql from the sqldf package. For some reason, the result returns 0 results. What is wrong with my code? I get a warning message of "closing unused connection" which may not be related to the fact that no results are returned. My code is below.

     TestData <- read.csv.sql("2025_nonroad_ff10_NCD20130831_23feb2015_v3_part1.csv", sql = "select * from file where poll == 'EXH__100414';", header=TRUE, skip=27, eol="\n", sep=",")

If I run a simpler SQL statement: select * from file limit 2, the result is:

result of sql limit 2

clarity123
  • 1,956
  • 10
  • 16

1 Answers1

0

Perhaps this revision might help:

TestData <- read.csv.sql("2025_nonroad_ff10_NCD20130831_23feb2015_v3_part1.csv", sql = "select * from file where poll = 'EXH__100414'", header=TRUE, skip=27, eol="\n", sep=",")

They were only minor changes:

  • removed double equals in the sql statement
  • removed closing semi-colon, in other programs a closing semi-colon is needed but in read.csv.sql it is not

If that doesn't work we need to try to isolate the problem

Try a simpler SQL such as showing just first two records. select * from file limit 2, does that even work?

If it works, then it means everything else is working but your original sql condition is bad, wrong etc.

If not, it means there is something else wrong with the rest of the read.csv.sql arguments, or perhaps with the file, or read.csv.sql itself.

clarity123
  • 1,956
  • 10
  • 16
  • Unfortunately, the minor revisions did not work. Fortunately, the code "select * from file limit 2" does work! Now I just don't know why my original query didn't work. I noticed that the results in the data frame were with quotes and format was character. Are the quotes throwing this off? – user225205 Oct 06 '15 at 16:54
  • 1
    Glad to hear of it. Now the better way to proceed is probably if you could post the result of "select * from file limit 2" in the original post , if there are no privacy concerns. If there are privacy concerns, then if possible could you post from the result, just what the "poll" column has, so possibly we can see if there are any new clues about the values, formatting etc. Your question about >data frame were with quotes and format was character , you mean you did an str(TestData) and shows format for most of them are character? Well anyhow an excerpt or showing results of the limit helps. – clarity123 Oct 06 '15 at 17:08
  • Posted an image of the results above. Click "results". Also I was playing around with the other data in the other columns, and instead I got a different error: "expected 45 columns of data but found 21". The original file has a total of 45 columns but only 21 actually have data. – user225205 Oct 06 '15 at 18:15
  • I assume the poll column is character because the TestData object in the global environment shows the variable poll as chr. – user225205 Oct 06 '15 at 18:42
  • So you tried the sql `select * from file where poll = 'EXH__100414'` and it doesnt work yet I see the screen shot such a value `EXH__100414` does exist. hmm, what about try `select * from file where ann_value = 0.001080156`, where `0.001080156` is a value known to exist according to your screenshot (also double check if entire value is shown, or if the column is cropping the value). We know ann_value is *not* a chr so if it works for non-chr type then it means we need to look further into how we are matching chr type. – clarity123 Oct 06 '15 at 19:14
  • Querying on ann_value = 0.001080156 works and the entire value is shown (i.e. the column does not crop the values). I think the character may have something to do with literal character strings? @ Grothendieck, thanks for the suggestion. Question 11 mentions eol characters, which is already in my code. – user225205 Oct 07 '15 at 11:17
  • Update: I made a smaller csv subset from the original file, and I am able to filter using %VOC in the sql expression from this smaller csv file. When I apply the exact same code to the original 6gb file, I get an odd warning message with 0 observations: In .local(conn, ...) : Unfinalized prepared statements. Odd that the exact same code worked with the same data on a smaller dataset but not on the original file. – user225205 Oct 07 '15 at 12:46