2

I'm new at using Postgresql, and I'm having trouble populating a table I created with multiple *.csv files. I was working first in pgAdmin4, then I decide to work on RPostgreSQL as R is my main language.

Anyway, I am dealing (for now) with 30 csv files located in one folder. All have the same headers and general structure, for instance:

Y:/Clickstream/test1/video-2016-04-01_PARSED.csv
Y:/Clickstream/test1/video-2016-04-02_PARSED.csv
Y:/Clickstream/test1/video-2016-04-03_PARSED.csv

... and so on.

I tried to load all csv files by using a following the RPostgresql specific answer from Parfait. Sadly, it didn't work. My code is specified below:

library(RPostgreSQL)

dir = list.dirs(path = "Y:/Clickstream/test1")
num = (length(dir))

psql.connection <- dbConnect(PostgreSQL(), 
                  dbname="coursera", 
                  host="127.0.0.1", 
                  user = "postgres", 
                  password="xxxx")

for (d in dir){
  filenames <- list.files(d)

for (f in filenames){  
  csvfile <- paste0(d, '/', f)

# IMPORT USING COPY COMMAND
sql <- paste("COPY citl.courses FROM '", csvfile , "' DELIMITER ',' CSV ;")
dbSendQuery(psql.connection, sql)

  }
}

# CLOSE CONNNECTION
dbDisconnect(psql.connection)

I'm not understanding the error I got:

Error in postgresqlExecStatement(conn, statement, ...) : 
RS-DBI driver: (could not Retrieve the result : ERROR:  could not open file
" Y:/Clickstream/test1/video-2016-04-01_PARSED.csv " for reading: Invalid
argument
)

If I'm understanding correctly, there is an invalid argument in the name of my first file. I'm not very sure about it, but again I am recently using PostgreSQL and this RPostgreSQL in R. Any help will be much appreciated.

Thanks in advance!

Edit: I found the problem, but cannot solve it for some reason. When I copy the path while in the for loop:

# IMPORT USING COPY COMMAND
sql <- paste("COPY citl.courses FROM '",csvfile,"' DELIMITER ',' CSV ;")

I have the following result:

sql 
[1] "COPY citl.courses FROM ' Y:/Clickstream/test1/video-2016-04-01_PARSED.csv ' DELIMITER ',' CSV ;"

This means that the invalid argument is the blank space between the file path. I've tried to change this unsuccessfully. Any help will be deeply appreciated!

Community
  • 1
  • 1
  • 1. COPY command run on server, and file also must be on a server. 2. Please check rights on files - it must be allowed to read for `postgres` user – Roman Tkachuk Feb 23 '17 at 23:50
  • Hi Roman. Thank you very much for your comment. I'm afraid I'm not very familiar with psql, so I'm having a little trouble understanding your 2 answers. Please, correct me if I'm wrong interpreting them: 1) What do you mean by having the file on a server? I actually have all my files on a server, but they are not loaded in psql yet. 2) I'm a superuser, and have all rights in my database. I have also administrative rights in the server where my files are. – lescobedo21 Feb 24 '17 at 16:07

1 Answers1

1

Try something like this

Files <- list.files("Y:/Clickstream/test1", pattern = "*.csv", full.names = TRUE)

CSVs <- lapply(Files, read.csv)

psql.connection <- dbConnect(PostgreSQL(), 
                  dbname="coursera", 
                  host="127.0.0.1", 
                  user = "postgres", 
                  password="xxxx")


for(i in 1:length(Files)){

  dbWriteTable(psql.connection
    # schema and table
    , c("citl", "courses")
    , CSVs[i]
    , append = TRUE # add row to bottom
    , row.names = FALSE
    )

}
JackStat
  • 1,593
  • 1
  • 11
  • 17
  • 2
    Hi JackStat. Thanks for your answer. I'm trying to avoid loading the files in R, as they add up to 11 gigabytes (only for the test folder). If they were not as "heavy", your code will certainly work for my case. My intention was to load the files in psql to make things quicker. Again, thank you very much! – lescobedo21 Feb 24 '17 at 16:04