2

I am using 'rethinkdb import' to import a CSV file where one of the fields is a valid JSON object. However, it seems like RethinkDB is encoding this field as a string since I am unable to use nested filters to query the data set.

How do I specify the data type of each field at import time or modify the assumed data type after the import is finished?

Alienfluid
  • 326
  • 1
  • 3
  • 11
  • Do you know what field that will be JSON beforehand? Why not just parse it before inserting? – Tholle Feb 29 '16 at 23:02
  • Yes, it's the same field for all the records. What do you mean by parsing it before inserting? I am using rethinkdb's import command for mass ingestion -- ingesting the records one by one will take forever. – Alienfluid Mar 01 '16 at 01:22
  • My apologies. I misunderstood the question. – Tholle Mar 01 '16 at 01:30

1 Answers1

2

From the documentation:

JSON files are preferred to CSV files, as JSON can represent RethinkDB documents fully. If you’re importing from a CSV file, you should include a header row with the field names, or use the --no-header option with the --custom-header option to specify the names.

rethinkdb import -f users.csv --format csv --table test.users --no-header \
    --custom-header id,username,email,password

Values in CSV imports will always be imported as strings. If you want to convert those fields after import to the number data type, run an update query that does the conversion. An example runnable in the Data Explorer:

r.table('tablename').update(function(doc) {
    return doc.merge({
        field1: doc('field1').coerceTo('number'),
        field2: doc('field2').coerceTo('number')
    })
});
Tholle
  • 108,070
  • 19
  • 198
  • 189
  • Thank you -- not sure how I missed the second part. – Alienfluid Mar 01 '16 at 15:10
  • Hmm .. now getting this error from RethinkDB: `Cannot coerce STRING to OBJECT.` The documentation does not list string -> object as a valid coerce operation unfortunately. https://rethinkdb.com/api/javascript/coerce_to/ – Alienfluid Mar 01 '16 at 15:12
  • @Alienfluid Ah, bummer. Maybe nested fields can be specified in the first step..? Just shooting in the dark here. – Tholle Mar 01 '16 at 15:14
  • 2
    What format are the objects in? Are they JSON objects? In that case you can use `field1: r.json(doc('field1'))` inside the mentioned merge command. – Daniel Mewes Mar 03 '16 at 18:55