I'm working with a program that outputs a database of results. I have hundreds of these databases that are all identical in structure and I'd like to combine them into ONE big database. I'm mostly interested in 1 table from each database. I don't work with databases/sql very much, but it would simplify other steps in the process, to skip outputting a csv.
Previously I did this by exporting a csv and used these steps to combine all csvs:
Vector of all CSVs & combine
library(DBI)
library(RSQLite)
library(dplyr)
csv_locs<- list.files(newdir, recursive = TRUE, pattern="*.csv", full.names = TRUE)
pic_dat <- do.call("rbind", lapply(csv_locs,
FUN=function(files){data.table::fread(files, data.table = FALSE)}))
How to do this with sql type database tables??
I'm basically pulling out the first table, then joining on the rest with a loop.
db_locs <- list.files(directory, recursive = TRUE, pattern="*.ddb", full.names = TRUE)
# first table
con1<- DBI::dbConnect(RSQLite::SQLite(), db_locs [1])
start <- tbl(con1, "DataTable")
# open connection to location[i], get table, union, disconnect; repeat.
for(i in 2:length(db_locs )){
con <- DBI::dbConnect(RSQLite::SQLite(), db_locs[i])
y <- tbl(con, "DataTable")
start <- union(start, y, copy=TRUE)
dbDisconnect(con)
}
This is exceptionally slow! Well, to be fair, its large data and the csv one is also slow.
I think I honestly wrote the slowest possible way to do this :) I could not get the do.call/lapply option to work here, but maybe I'm missing something.