3

I'm using the R package "bigrquery" to upload data from an R data frame into an existing BigQuery table as follows:

mybq = bq_table(project='...', dataset='...', table=...)
bq_table_upload(x=mybq, values=..., create_disposition='CREATE_NEVER', 
                write_disposition='WRITE_APPEND')

But I'm the following error message:

Error: Invalid schema update. Field NewID has changed type from STRING to INTEGER

BigQuery appears to be auto-detecting the data format and mistakenly thinks the NewID column, which has values like "00487", is numeric when it's actually a string. When I append an "x" character to the NewID values, then the error goes away and the upload functions perfectly. Is there any way to disable the auto-detection when uploading data using the "bigrquery" package?

user1769120
  • 107
  • 1
  • 7
  • maybe convert NewID column into character class using `as.character` before uploading? – chinsoon12 Jul 05 '18 at 01:13
  • It's already a character in R. Otherwise it can't have leading zeros. The corresponding column in the BigQuery table is a STRING data type. The problem is the auto-detection that ignores the explicit data type; I can't figure out how to turn that feature off. – user1769120 Jul 05 '18 at 03:14

1 Answers1

5

The bq_perform_load function in the same library should be a solution for this. In this function you can specify the schema with parameter fields so Bigquery will not auto-detect schema as explained here.

I tested and it worked on my end. I created a table with two columns (STRING, STRING) and this is my source data:

0017    0015
0123    1234
1022    1202

I run the following command in R to make it work:

bq_perform_load('MY_PROJECT.MYDATASET.MYTABLE', "MY_GCS_OBJECT_LINK", nskip = 0, fields = list(bq_field("test1", "string"),bq_field("test2", "string")) , source_format = "CSV",create_disposition = "CREATE_NEVER", write_disposition = "WRITE_APPEND")

Note: I tried it at first time to run the same command with fields = NULL and it failed.

enle lin
  • 1,664
  • 8
  • 14
  • 1
    Thank you! That was very helpful. – user1769120 Jul 07 '18 at 14:37
  • I am getting an error `Error: Source URI must be a Google Cloud Storage location` while using the above code. I have created a bucket on GCS. How can I direct R to specify the source. Thanks. – marine8115 Aug 13 '20 at 16:16
  • @PriyankaIndapurkar you should give a fully-qualified url as explained [here](https://github.com/r-dbi/bigrquery/blob/master/R/bq-perform.R#L161). How are you specifying the source_url param? – enle lin Aug 14 '20 at 11:20