1

I need to extract data from a BYTEA field of a postgeSQL database. In the BYTEA field, the data is stored in the RDS format (it is the contents of a .rds file created by the command saveRDS corresponding to a dataframe). Using the package RPostgreSQL I am able to retrieve the bytea column in a character object thanks to the code given in this link https://github.com/codeinthehole/rpostgresql/blob/master/RPostgreSQL/tests/bytea.R

But how can I transform then my character object (in RDS format) in the corresponding dataframe ? I try to use readRDS but it seems only to work with a file.

Thank you

## (supposing the connection to the database)
## I do the select statement to retrieve the bytea column
##
rows <- dbGetQuery(con, "SELECT image_bytea from tabdata")
str(rows)
# 'data.frame': 1 obs. of  1 variable:
#   $ image_bytea: chr "\\x1f8b08000000000000065| __truncated__
str(rows[1,1])
# chr "\\x1f8b08000000000000065c9d655456585"| __truncated__
test <- postgresqlUnescapeBytea(rows[1,1])
conraw <- rawConnection(test,open="r")
mydf <- readRDS(file=conraw)
# Error in readRDS(file = conraw) : format d'entrée inconnu

When I use the psql in a cmd MSwindows, it works fine:

psql -At -p 5432 -h localhost -d cabri -U postgres -W -c "select lo_from_bytea(0,image_bytea) from tabdata'
# 31385
psql -At -p 5432 -h localhost -d cabri -U postgres -W -c "\lo_export 31385 'c:/upload-postgre/test-out.rds'"

And in R:

mydf <- readRDS(file = "c:/upload-postgre/test-out.rds")
class(mydf)
##   [1] "data.frame"
head(mydf)

##     tps      pc1     pc1f pc1f_partiel
## 1 -0.4000000 154.2554 70.62402     157.4627  93.89868 
## 2 -0.3999995 154.2127 70.94351     157.4627 153.9946 
## etc... 

Laurent

Laurent
  • 21
  • 4
  • What does your code look like? It seems like you can use `postgresqlUnescapeBytea` to get the byte data in a raw vector, and then you can use `rawConnection()` as an input to `readRDS` to read from that raw vector. – MrFlick Mar 25 '21 at 20:29
  • Thank you Mr Flick. Here is my code: – Laurent Mar 26 '21 at 16:11
  • I added also the psql commands because when I use psql it works fine. – Laurent Mar 26 '21 at 16:33
  • @MrFlick I tried to use rawConnection without success. However the format is correct since the psql method works fine. – Laurent Mar 27 '21 at 07:38
  • The page of test on github doesn't use `lo_from_bytea` so i'm not sure what that function does. I do not have access to a postgeSQL so I can't run any tests or anything myself. Sorry. – MrFlick Mar 27 '21 at 17:11
  • Indeed, the data was not stored directly by R but by the postgre commands. There is certainly a difference between the binary format stored in the bytea column and the .rds file created by \lo_export. I will try to see that. Thank you @MrFlick – Laurent Mar 28 '21 at 08:06

1 Answers1

1

Here's what worked for me - do note that it makes use of dbplyr (here assuming that you're trying to get a single row from some table "tablename" in schema "schemaname", and the column with the RDS data is called "rds_output"):

library(tidyverse)
library(dbplyr)

con <- DBI::dbConnect(RPostgres::Postgres(), dbname = "example_db", host = "localhost", port = 5432, user = "username", password = "supersafepassword")

proj_tbl <- tbl(con, in_schema("schemaname", "tablename"))

proj_id <- 1 ## just something to grab a specific row

data <- proj_tbl %>%
  filter(pr_id == proj_id) %>% ## filtering for a specific row
  pull(rds_output) %>% unlist %>% rawConnection %>% gzcon %>% readRDS
pharmhax
  • 51
  • 6