Questions tagged [sqldf]

SQLDF is an R package for running SQL statements on R data frames.

SQLDF is an R package for running SQL statements on R data frames.

The user simply specifies an SQL statement in R using data frame names in place of table names and a database with appropriate table layouts/schema is automatically created, the data frames are automatically loaded into the database, the specified SQL statement is executed, the result is read back into R and the database is deleted all automatically behind the scenes.

sqldf supports:

  • the SQLite backend database (by default),
  • the H2 java database,
  • the PostgreSQL database and
  • sqldf 0.4-0 onwards also supports MySQL.

Resources:

684 questions
2
votes
1 answer

SQLDF error - syntax error

I'm running the following command: sqldf("SELECT * FROM dat WHERE Form LIKE '%compare%' AND Create.Date LIKE '%7/01/11%' AND Post.Status = 'Success'") but end up getting the following error. Error in sqliteExecStatement(con, statement, bind.data) :…
ATMathew
  • 12,566
  • 26
  • 69
  • 76
2
votes
3 answers

Find the next date greater than date in date from column in R

I want to find a first date greater than the given date in a column. eg: Pnp, Date1 Date2 A100,1/1/2020,1/1/2020 A100,1/1/2020,1/7/2020 A100,1/1/2020,1/1/2021 A100,1/1/2020,1/7/2021 Sample output: Pnp,Date1,Date2,Date3,New Column …
2
votes
1 answer

pandasql is not working in a function python pandas

I am using pandasql to do sql join on pandas dataframe. it's working well without putting them in a function. But after putting everything together in a function, it gave me an error said: NameError: global name 'sqldf' is not defined the code i am…
vivianna
  • 99
  • 1
  • 11
2
votes
2 answers

Select all columns except one in R sqldf package

Is there a way in R using the sqldf package to select all columns except one?
2
votes
1 answer

Automate column choice in R base order()

I am working on an automation project in R. I want to order an input dataset (i.e. dataframe) by row values. No hardcoded values are allowed, since it's automated code. Reproducible example below: data("mtcars") groupBY <- c('cyl','carb') Normally,…
Pittoro
  • 83
  • 6
2
votes
1 answer

Summary report of count within subclusters

I have a dataset as below. Requirement is to count the number of IP only - Both Parent & Child in a subcluster within a cluster is IP, P&T only - Both Parent & Child in a subcluster within a cluster is P&T IP->P&T - When Parent is IP & Child is…
2
votes
1 answer

Conditional row wise aggregation using "or" in data.table

I have a reasonably large (3 million rows) data.table containing invoices and payments from many customers, where each customer has a number of (unique) documents that in turn each have a creation date, and a date where the document is payed. If…
2
votes
1 answer

R SQLDF how to handle NAs, missing values in operations such as divisions?

I typically use functions such as is.na and COALESCE but I cannot find such as functions for R's SQLDF which makes its usage hard with messy data with missing values. Is there some way to force SQLDF to return NA in the case when NAs are occurring…
hhh
  • 50,788
  • 62
  • 179
  • 282
2
votes
1 answer

Preserving times Class with sqldf

I am using sqldf to join several tables, but I am having trouble preserving the times class set with the chron package on a column. I used the method="name__class" argument to the sqldf function and appropriately named my column with the class, but…
StatsStudent
  • 1,384
  • 2
  • 10
  • 28
2
votes
2 answers

Use sqldf to join exactly on id and on the most recent date in a lagged window

I would like to join two data sets, A and B. I would like to join A and B exactly on their id variables, but keep only the most recent observation in B that is between three months and three years old. The data sets are big enough that I need to use…
Richard Herron
  • 9,760
  • 12
  • 69
  • 116
2
votes
1 answer

Replace String Pattern in sqldf

I have a dataframe like below: Col1 Col2 Col3 ten: end 5 10 five: nb 7 11 12:4 12 10 13:56 15 16 Using the sqldf package in R, I want to do the following: Replace values in Col1 with:…
nak5120
  • 4,089
  • 4
  • 35
  • 94
2
votes
1 answer

If NA exists in column, replace with value in another column - sqldf

I have a dataframe below: df ColA ColB ColC NA BN 6 JH NA 8 NA rewr 9 NA NA 10 Expected Output: newdf ColA ColB ColC New_Col NA BN 6 BN JH NA …
nak5120
  • 4,089
  • 4
  • 35
  • 94
2
votes
1 answer

result_create(conn@ptr, statement) : Result too large

t1_DA <- sqldf("select decile, count(decile) as count, avg(pred_spent) as avg_pred_spent, avg(exp(total_spent)) as avg_total_spent, avg(log(pred_spent)) as ln_avg_pred_spent, …
Random user
  • 33
  • 1
  • 7
2
votes
1 answer

enabling SQLite's generate_sequence to be used in R's sqldf library under MS windows

I am hoping to use SQLite3's generate_series Table-Valued Function from within R's sqldf library, which in turn uses the RSQLite library. I expect once installed/configured, I will be able to call it as: sqldf('SELECT value FROM…
malcook
  • 1,686
  • 16
  • 16
2
votes
1 answer

Index by category with sorting by column in R sqldf package

How to add index by category in R with sorting by column in sqldf package. I look for equivalent of SQL: ROW_NUMBER() over(partition by [Category] order by [Date] desc Suppose we have a table: +----------+-------+------------+ | Category | Value | …
Przemyslaw Remin
  • 6,276
  • 25
  • 113
  • 191