0

Trying to load a csv file into dse cassandra using the dsbulk utility. I am running into issues if the column is defined as set.

copy command is successfully loading "{'bible', 'moses', 'ramses'}" & "{'televison'}" . But, dsbulk fails when there are multiple values with com.datastax.driver.core.exceptions.InvalidTypeException: Could not parse as Json.

CREATE TABLE killrvideo.videos (
    videoid uuid,
    added_date timestamp,
    description text,
    location text,
    location_type int,
    name text,
    preview_image_location text,
    tags SET<text>,
    userid uuid,
    PRIMARY KEY (videoid)
)

The data file is : https://github.com/KillrVideo/killrvideo-cdm/blob/master/data/videos.csv

Command:

dsbulk load --driver.auth.provider PlainTextAuthProvider -u *** -p *** -header false -url /data/videos.csv -k killrvideo -t videos

com.datastax.driver.core.exceptions.InvalidTypeException: Could not parse '{'aunt', 'black stereotype', 'blood on shirt', 'butt bolo', 'chest', 'death of family', 'flasher', 'kicked in the face', 'masturbation', 'renovation', 'stabbed in the'}' as Json

adutra
  • 4,231
  • 1
  • 20
  • 18
Prak_Rum
  • 25
  • 1
  • 11

1 Answers1

2

This is occurring because the videos.csv file was created from CQLSH COPY originally, and the format of collections is with curly-braces {} around them. DSBulk expects collection values to be json arrays, whose syntax is to surround the collection with square brackets: [].

It turns out there is an open ticket in DSBulk to handle CQL literals for collections, tuples, and UDTs. In the meantime, please use CQLSH COPY to load the data into your table.

MrSandman
  • 36
  • 2
  • 1
    Thank You. Makes sense. Glad that its being worked on. Backward compatibility should always be considered. By any chance, are these tickets open to public? Hoping can take a look and possibly track it. Doesn't look like its opensource yet. – Prak_Rum Sep 22 '18 at 04:24
  • DSBulk is indeed not open-source but your request has been recorded. If you have any urgent needs please consider reaching out to DataStax support. Thanks for trying it out! – adutra Sep 22 '18 at 09:12