2

Is there a way in R using the sqldf package to select all columns except one?

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Requesting you to please provide sample data ```dput```and revised the question with details.as ```sqldf``` is for aggregating the data – Tushar Lad Aug 28 '20 at 07:07
  • Why is that needed for that kind of question? It is not specific data issue, but rather R package related question. – Giedrius Urbonas Aug 28 '20 at 07:10

2 Answers2

1

Your call to sqldf based on some query should return a data frame, where each DF column corresponds to one of the columns appearing in the select clause of your SQL query. Consider the following example:

sql <- "SELECT * FROM yourTable WHERE <some conditions>"
df <- sqldf(sql)
drop <- c("some_column")
df <- df[, !(names(df) %in% drop)]

Note in the above I am doing a SELECT * to fetch all columns in the table (what I assume is your use case). I then subset off a column some_column from the resulting data frame.

Note that doing this from SQL directly generally is not possible. That is, once you do SELECT *, the cat is out of the bag, and you end up with all columns.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Ok, thanks. But is there any way to avoid double columns after joining 2 tables? I mean if I have table ```a``` and table ```b``` and I am joining both them using same column ```customer_id``` in the final table I see two columns with same name ```customer_id```. Here is the code example ```data <- sqldf("selec a.*, b.* from a inner join b on (a.customer_id = b.customer_id)")``` – Giedrius Urbonas Aug 28 '20 at 07:25
  • No, there is no way to reliable way to do this, _unless_ the two columns named the same also coincidentally happen to have the same exact data. In that case, you could, from R, arbitrarily just retain one of those two columns. But, I don't recommend going down this road. In general, you should _always_ explicitly list all columns you want to select. – Tim Biegeleisen Aug 28 '20 at 07:26
  • I have answered your original question and you should not be giving follow-up questions here. – Tim Biegeleisen Aug 28 '20 at 07:27
  • But if the question is rising from your answer I think this is the right place to ask additional questions. Anyway, thanks. – Giedrius Urbonas Aug 28 '20 at 07:42
  • Anyway, I think it was best place to answer that additional question, because it is related to same topic - excluding column when selecting data in SQL. – Giedrius Urbonas Aug 28 '20 at 07:45
  • @GiedriusUrbonas It's a good question, at least from an `sqldf` point of view. While a SQL result set can have duplicate column names, an R data frame _cannot_ have this, i.e. all column names must be unique. You may play around with `sqldf` to see how you might handle this other challenge. – Tim Biegeleisen Aug 28 '20 at 07:49
0

1) SQLite Using the default SQLite backend, suppose we want to return the first 3 rows of all columns in mtcars except for the cyl column. First create a comma separated string, sel, of all such column names and then use fn$sqldf to allow string interpolation referring to it in the SQL statement as $sel. Add verbose=TRUE argument to sqldf if you want to see the SQL statement that was generated.

library(sqldf)

sel <- toString(setdiff(names(mtcars), "cyl"))
fn$sqldf("select $sel from mtcars limit 3")

giving:

   mpg disp  hp drat    wt  qsec vs am gear carb
1 21.0  160 110 3.90 2.620 16.46  0  1    4    4
2 21.0  160 110 3.90 2.875 17.02  0  1    4    4
3 22.8  108  93 3.85 2.320 18.61  1  1    4    1

2) H2 The H2 backend supports alter table ... drop column ... so we can write the following. Since alter does not return anything we add a select which returns the altered table.

library(RH2)
library(sqldf)

sqldf(c("alter table mtcars drop column cyl", 
        "select * from mtcars limit 3"))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341