3

I am using a SQL Server database and JDBC and pool with the sqljdbc42.jar driver.

Code:

library(DBI)
library(RJDBC)
library(pool)

jar.path.ms.sql.driver <- "./www/base/sql_drivers/sqljdbc42.jar"

jdbc.sql.driver <- JDBC(
  driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver",
  classPath   = jar.path.ms.sql.driver
)
sql.url <- "jdbc:sqlserver://sql01" # our server url
sql.dname <-  "my_projects"
sql.username <- "user"
sql.password <- "password"

pool <- dbPool(
  drv      = jdbc.sql.driver,
  url      = sql.url,
  dname    = sql.dname,
  username = sql.username,
  password = sql.password
)

So far, so good. I can read tables from the database using dbReadTable. I can also write tables to the database using dbWriteTable. For the write case, I have columns in the data frame to be written with very long strings. In some cases the strings are longer than 255 characters.

I had hoped to use the field.types parameter of the dbWriteTable function to configure the SQL Server columns correctly (comments is my data frame to write to the database):

conn <- poolCheckout(pool)

DBI::dbWriteTable(conn = conn, 
  DBI::Id(cluster = "my_projects", schema = "dbo", table = "comments"), 
  value = comments, 
  overwrite = TRUE,
  row.names = FALSE,
  field.types = c(
               STRATEGIC_AREA = "varchar(255)",
               OBJECTIVE = "varchar(255)",
               METHOD_OF_MEASURE = "varchar(MAX)",
               TARGET = "float",
               UNIT = "varchar(255)",
               MIN_MAX = "varchar(255)",
               JUL = "varchar(1024)",
               AUG = "varchar(MAX)",
               SEP = "varchar(MAX)",
               OCT = "varchar(MAX)",
               NOV = "varchar(MAX)",
               DEC = "varchar(MAX)",
               JAN = "varchar(MAX)",
               FEB = "varchar(MAX)",
               MAR = "varchar(MAX)",
               APR = "varchar(MAX)",
               MAY = "varchar(MAX)",
               JUN = "varchar(MAX)"
  ))
poolReturn(conn)
poolClose(pool)

The problem is that the SQL Server database ignores the column classes and is created consistently with varchar(255) columns. Column classes such as varchar(1024) or varchar(MAX) in field.types are ignored. The function appears to follow a mapping as per the dataframe column classes:

  • char -> varchar(255)
  • num -> float

And it appears to make no difference if I replace DBI::Id(cluster = "my_projects", schema = "dbo", table = "comments") with "my_projects.dbo.comments". It also does not matter if I quote the left hand side of the named character vector I pass to field.types.

As a result, the data frame rows that contain strings longer than 255 characters are truncated (those lines are skipped) and I end up with fewer rows in the SQL Server table compared to the data frame. Error message (the very long string sits in the FEB column):

Error in .local(conn, statement, ...) :
execute JDBC update query failed in dbSendUpdate (String or binary data would be truncated in table 'my_projects.dbo.comments', column 'FEB'. Truncated value:

What am I missing?

UPDATE

Based on sgoley's update below I changed the code as follows:

dbCreateTable(
  conn = conn,
  Id(
    database = "my_projects",
    schema = "dbo",
    table = "comments"
  ),
  fields = c(
    STRATEGIC_AREA = "varchar(255)",
    OBJECTIVE = "varchar(255)",
    METHOD_OF_MEASURE = "varchar(255)",
    TARGET = "float",
    UNIT = "varchar(255)",
    MIN_MAX = "varchar(255)",
    JUL = "varchar(MAX)",
    AUG = "varchar(MAX)",
    SEP = "varchar(MAX)",
    OCT = "varchar(MAX)",
    NOV = "varchar(MAX)",
    DEC = "varchar(MAX)",
    JAN = "varchar(MAX)",
    FEB = "varchar(MAX)",
    MAR = "varchar(MAX)",
    APR = "varchar(MAX)",
    MAY = "varchar(MAX)",
    JUN = "varchar(MAX)"
  ),
  row.names = NULL
)


values <- DBI::sqlAppendTable(con = conn, 
                              table = Id(database = "my_projects", schema = "dbo", table = "comments"), 
                              values = comments,
                              row.names = FALSE)
RJDBC::dbSendUpdate(conn, values)

This now works without any problems and error messages. Verification using Microsoft SQL Server Management Studio confirmed that the columns are now the correct class and follow the fields specification.

Paul van Oppen
  • 1,443
  • 1
  • 9
  • 18
  • I would consider `sqljdbc42` to be relatively old since it's intended for JRE 8 applications. Have you tried using `varchar(8000)`, which is the maximum field length for older implementations? Also possibly `varchar(-1)` works to trick some implementations into sending `varchar(max)` but I don't know how R DBI will react to it. – AlwaysLearning Mar 30 '21 at 07:29
  • I changed to `mssql-jdbc-9.2.1.jre8.jar` which appears to not change this behaviour. I used your suggestions with `varchar(-1)` and `varchar(8000)`. Again, no changes. I use MS SQL server management studio 18 to inspect the resulting table in the database. And it tells me that the `character` columns are always `varchar(255)`. – Paul van Oppen Mar 30 '21 at 23:23
  • Do you have the possibility to test SQL SERVER Native Client : from the few tests I did with it, I don't see the 255 characters limit. – Waldi Apr 06 '21 at 06:56
  • My previous comment implies using `odbc` instead of `jdbc` – Waldi Apr 06 '21 at 07:15
  • @PaulvanOppen What edition and version of SQL Server are you using? – sgdata Apr 06 '21 at 13:26
  • We are using Microsoft SQL Server Standard 2019 version 15.0.4073.23 at the moment – Paul van Oppen Apr 06 '21 at 23:57
  • `RJDBC` is ignoring `field.types`, see https://github.com/cran/RJDBC/blob/master/R/class.R#L271. I suggest you write a feature-request for Simon to add `field.types=` to the method (or some other fashion better for that architecture ... though I don't see an easy other-way). – r2evans Apr 07 '21 at 15:56
  • Thanks @r2evans, I may do that. – Paul van Oppen Apr 08 '21 at 08:33
  • FWIW neither `fields` nor `row.names` are arguments to `dbWriteTable`, the signature is `dbWriteTable(conn, name, value, overwrite = FALSE, append = FALSE, force = FALSE, ..., max.batch = 10000L)` (see [documentation](http://www.rforge.net/doc/packages/RJDBC/JDBCConnection-methods.html)) so the values passed above have no meaning (`field.types` was not part of the DBI API). The documentation suggests essentially the above except using `dbSendUpdate()` which is more efficient (or `dbWriteTable(..,append=TRUE)` instead of `INSERT`). – Simon Urbanek Mar 24 '22 at 09:08

1 Answers1

3

Trying to build as complete an answer as possible - let's start with the RJDBC project first


RJDBC

Repo: https://github.com/s-u/RJDBC

Looks to me like Simon may be inadvertently overwriting all your explicit field type declarations with his driver level type mappings here?

maptypes.R

I say that only as a very high level speculation because definitely out of my depth when it comes to the intersection of R, Java, and DBI but if the shoe fits?

When I run something like:

> dbDataType(jdbc.sql.driver,titanic)
   Class   Sex      Age      Survived  Freq 
  "TEXT"   "TEXT"   "TEXT"   "TEXT"    "DOUBLE" 

And it seems like the driver maps those types to varchar(255) which is why you are seeing all your created columns set to that by default.

Anyways. Just speculation on causes, moving on to solving the problem.


Columns and Types

Utilizing all the above comments and suggestions, so far, to me the best and only way to implement these datatypes in sql server (without doing it super explicitly through dbExecute which you could) is:

pool <- dbPool(
  drv      = jdbc.sql.driver,
  url      = sql.url,
  dname    = sql.dname,
  schema   = sql.schema,
  username = sql.username,
  password = sql.password
)

conn <- poolCheckout(pool)

dbCreateTable(conn= conn,
              Id(database = "my_project", 
                 schema = "dbo", 
                 table = "titanic"),
                 fields = c( Class="nvarchar(max)",
                             Sex="varchar(max)",
                             Age="ntext",
                             Survived="text",
                             Freq="integer"
                            ),
                 row.names = NULL)

Which produces:

SSMS Table Screenshot


Inserting Data

And then from there the idea would be to insert rows with something like:

dbAppendTable(conn= conn,
              name= Id(database = "my_project", schema = "dbo", table = "titanic"), 
              value = titanic)

But I'm getting the following error with that:

Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set",  : 
  Unable to retrieve JDBC result set
  JDBC ERROR: The value is not set for the parameter number 2.
  Statement: INSERT INTO "my_project"."dbo"."titanic"
  ("Class", "Sex", "Age", "Survived", "Freq")
VALUES
  (?, ?, ?, ?, ?)
In addition: Warning message:
In if (is.na(v)) { :
  the condition has length > 1 and only the first element will be used

So the better option for now is to use:

values <- DBI::sqlAppendTable(con = conn, table = Id(database = "my_project", schema = "dbo", table = "titanic"), values = titanic)
DBI::dbExecute(conn = conn, values)

Resulting in the intended rows with requested column types: Data values with correct column types

sgdata
  • 2,543
  • 1
  • 19
  • 44