Questions tagged [sqldf]

SQLDF is an R package for running SQL statements on R data frames.

SQLDF is an R package for running SQL statements on R data frames.

The user simply specifies an SQL statement in R using data frame names in place of table names and a database with appropriate table layouts/schema is automatically created, the data frames are automatically loaded into the database, the specified SQL statement is executed, the result is read back into R and the database is deleted all automatically behind the scenes.

sqldf supports:

  • the SQLite backend database (by default),
  • the H2 java database,
  • the PostgreSQL database and
  • sqldf 0.4-0 onwards also supports MySQL.

Resources:

684 questions
5
votes
3 answers

NA values using sqldf

If I try to get an average of c(NA, NA, 3, 4, 5, 6, 7, 8, 9, 10) using AVG from SQL, I get a value of 5.2, instead of the expected 6.5. # prepare data and write to file write.table(data.frame(col1 = c(NA, NA, 3:10)), "my.na.txt", row.names =…
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
5
votes
1 answer

Error in R Using SQLDF: too many SQL variables

I have a large dataset with nearly 2000 variables in r. I then use sqldf to write a few case statements to create new columns on the original dataset. However I get the following error: Error in rsqlite_send_query(conn@ptr, statement) : too many…
greeny
  • 425
  • 1
  • 6
  • 20
5
votes
4 answers

How to get date using sqldf?

I have a data frame that has a "DATE" field. e.g.: "24-10-2015" The variable is in the date format. When I use sqldf, e.g.: select min(DATE), MAX (DATE) from table ... the output is a number like 16623. Tried FORMAT and CONVERT but they don't work…
saxo
  • 81
  • 1
  • 2
  • 7
5
votes
4 answers

R- sqldf -need explicit units for numeric conversion

I need to join 2 tables using a Date field > class(pagos$pseudo_1mes) [1] "Date" > class(pseudo_meses$pseudo_1mes) [1] "Date" My code is: library(sqldf) pagos<-sqldf("select a.*, b.mes_atras from pagos a left join pseudo_meses b …
GabyLP
  • 3,649
  • 7
  • 45
  • 66
5
votes
3 answers

Error: No Such Column using SQLDF

Below are the scripts > library(sqldf) > turnover = read.csv("turnover.csv") > names(turnover) [1] "Report.Date" "PersID" "Status" "DOB" [5] "Age" "Tenure" "Current.Hire.Date" "Term.Date" [9]…
PMa
  • 1,751
  • 7
  • 22
  • 28
5
votes
3 answers

RODBC: merge tables from different databases (channel)

I'm using RODBC package to connect to Oracle databases from R but I didn't succeed in merging tables from different databases without "downloading" the tables (I don't want to download them as they are too big!). I'd like to use something…
5
votes
3 answers

How to calculate number of occurrences per minute for a large dataset

I have a dataset with 500k appointments lasting between 5 and 60 minutes. tdata <- structure(list(Start = structure(c(1325493000, 1325493600, 1325494200, 1325494800, 1325494800, 1325495400, 1325495400, 1325496000, 1325496000, 1325496600,…
TimV
  • 53
  • 5
5
votes
0 answers

R stuck in loading sqldf package

I have been trying to use the sqldf package in R, but it simply stopped at loading required package "proto". Like this: library(sqldf) Loading required package: DBI Loading required package: gsubfn Loading required package: proto I'm using R…
J S
  • 113
  • 2
  • 6
5
votes
2 answers

Animals in the zoo: can we aggregate a daily time series of factors and flag activity by ID?

Suppose there is a daily time series of animal activity in a zoo over many years. A subset of a very large dataset might look like this: library(data.table) type <- c(rep('giraffe',90),rep('monkey',90),rep('anteater',90)) status <-…
hubert_farnsworth
  • 797
  • 2
  • 9
  • 21
4
votes
2 answers

using sqldf() to select rows that match a million items

This is a follow up on the answer provided here on using sqldf() https://stackoverflow.com/a/1820610 In my particular case, I have a tab-delimited file with over 110 million rows. I'd like to select the rows that match 4.6 million tag IDs. In the…
andrewj
  • 2,965
  • 8
  • 36
  • 37
4
votes
3 answers

Cumulative sum by group in sqldf?

I have a data frame with 3 variables: place, time, and value (P, T, X). I want to create a fourth variable which will be the cumulative sum of X. Normally I like to do grouping calculations with sqldf, but can't seem to find an equivalent for…
user702432
  • 11,898
  • 21
  • 55
  • 70
4
votes
2 answers

sqldf package in R, querying a data frame

I'm trying to rewrite some code using the sqldf library in R, which should allow me to run SQL queries on data frames, but I am having an issue in that whenever I try to run a query, R seems like it tries to query the actual real MySQL db con that…
Michael Discenza
  • 3,240
  • 7
  • 30
  • 41
4
votes
2 answers

sqldf, csv, and fields containing commas

Took me a while to figure this out. So, I am answering my own question. You have some .csv, you want to load it fast, you want to use the sqldf package. Your usual code is irritated by a few annoying fields. Example: 1001, Amy,9:43:00,…
Ryogi
  • 5,497
  • 5
  • 26
  • 46
4
votes
2 answers

Saving a flat file as an SQL database in R without loading it 100% into RAM

I hope that what I am about to write makes some sense. If you look at How to deal with a 50GB large csv file in r language? it is explained how to query à la SQL, a csv file from R. In my case, I have vast amount of data stored as large (or larger…
larry77
  • 1,309
  • 14
  • 29
4
votes
0 answers

Progress bar in sqldf

I am working with relatively large datasets and I was wondering if there is any option to track the progress in sqldf? The merge operations I am doing take a long time. I was searching for quite a while, but I don't think there is anything available…
EDC
  • 613
  • 2
  • 7
  • 16
1 2
3
45 46