Is there a way in R using the sqldf
package to select all columns except one?

- 502,043
- 27
- 286
- 360

- 83
- 9
-
Requesting you to please provide sample data ```dput```and revised the question with details.as ```sqldf``` is for aggregating the data – Tushar Lad Aug 28 '20 at 07:07
-
Why is that needed for that kind of question? It is not specific data issue, but rather R package related question. – Giedrius Urbonas Aug 28 '20 at 07:10
2 Answers
Your call to sqldf
based on some query should return a data frame, where each DF column corresponds to one of the columns appearing in the select clause of your SQL query. Consider the following example:
sql <- "SELECT * FROM yourTable WHERE <some conditions>"
df <- sqldf(sql)
drop <- c("some_column")
df <- df[, !(names(df) %in% drop)]
Note in the above I am doing a SELECT *
to fetch all columns in the table (what I assume is your use case). I then subset off a column some_column
from the resulting data frame.
Note that doing this from SQL directly generally is not possible. That is, once you do SELECT *
, the cat is out of the bag, and you end up with all columns.

- 502,043
- 27
- 286
- 360
-
Ok, thanks. But is there any way to avoid double columns after joining 2 tables? I mean if I have table ```a``` and table ```b``` and I am joining both them using same column ```customer_id``` in the final table I see two columns with same name ```customer_id```. Here is the code example ```data <- sqldf("selec a.*, b.* from a inner join b on (a.customer_id = b.customer_id)")``` – Giedrius Urbonas Aug 28 '20 at 07:25
-
No, there is no way to reliable way to do this, _unless_ the two columns named the same also coincidentally happen to have the same exact data. In that case, you could, from R, arbitrarily just retain one of those two columns. But, I don't recommend going down this road. In general, you should _always_ explicitly list all columns you want to select. – Tim Biegeleisen Aug 28 '20 at 07:26
-
I have answered your original question and you should not be giving follow-up questions here. – Tim Biegeleisen Aug 28 '20 at 07:27
-
But if the question is rising from your answer I think this is the right place to ask additional questions. Anyway, thanks. – Giedrius Urbonas Aug 28 '20 at 07:42
-
Anyway, I think it was best place to answer that additional question, because it is related to same topic - excluding column when selecting data in SQL. – Giedrius Urbonas Aug 28 '20 at 07:45
-
@GiedriusUrbonas It's a good question, at least from an `sqldf` point of view. While a SQL result set can have duplicate column names, an R data frame _cannot_ have this, i.e. all column names must be unique. You may play around with `sqldf` to see how you might handle this other challenge. – Tim Biegeleisen Aug 28 '20 at 07:49
1) SQLite Using the default SQLite backend, suppose we want to return the first 3 rows of all columns in mtcars
except for the cyl
column. First create a comma separated string, sel
, of all such column names and then use fn$sqldf
to allow string interpolation referring to it in the SQL statement as $sel
. Add verbose=TRUE
argument to sqldf
if you want to see the SQL statement that was generated.
library(sqldf)
sel <- toString(setdiff(names(mtcars), "cyl"))
fn$sqldf("select $sel from mtcars limit 3")
giving:
mpg disp hp drat wt qsec vs am gear carb
1 21.0 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 160 110 3.90 2.875 17.02 0 1 4 4
3 22.8 108 93 3.85 2.320 18.61 1 1 4 1
2) H2 The H2 backend supports alter table ... drop column ...
so we can write the following. Since alter
does not return anything we add a select
which returns the altered table.
library(RH2)
library(sqldf)
sqldf(c("alter table mtcars drop column cyl",
"select * from mtcars limit 3"))

- 254,981
- 17
- 203
- 341