8

I'm setting up a demo landscape for Cassandra, Apache Spark and Flume on my Mac (Mac OS X Yosemite with Oracle jdk1.7.0_55). The landscape shall work as a proof of concept for a new analytics platform and therefore I need some test data in my cassandra db. I am using cassandra 2.0.8.

I created some demo data in excel and exported that as a CSV file. The structure is like this:

ProcessUUID;ProcessID;ProcessNumber;ProcessName;ProcessStartTime;ProcessStartTimeUUID;ProcessEndTime;ProcessEndTimeUUID;ProcessStatus;Orderer;VorgangsNummer;VehicleID;FIN;Reference;ReferenceType
0F0D1498-D149-4FCC-87C9-F12783FDF769;AbmeldungKl‰rfall;1;Abmeldung Kl‰rfall;2011-02-03 04:05+0000;;2011-02-17 04:05+0000;;Finished;SIXT;4278;A-XA 1;WAU2345CX67890876;KLA-BR4278;internal

I then created a keyspace and a column family in cqlsh using:

CREATE KEYSPACE dadcargate 
WITH REPLICATAION  = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' };

use dadcargate;

CREATE COLUMNFAMILY Process (
  ProcessUUID uuid, ProcessID varchar, ProcessNumber bigint, ProcessName varchar, 
  ProcessStartTime timestamp, ProcessStartTimeUUID timeuuid, ProcessEndTime timestamp, 
  ProcessEndTimeUUID timeuuid, ProcessStatus varchar, Orderer varchar,
  VorgangsNummer varchar, VehicleID varchar, FIN varchar, Reference varchar,
  ReferenceType varchar, 
PRIMARY KEY (ProcessUUID))
WITH COMMENT='A process is like a bracket around multiple process steps';

The column family name and all columns in it are created with all lower case - will have to investigate into this as well some day, but that is not so relevant at the moment.

Now I take my CSV file, which has around 1600 entries and want to import that in my table named process like this:

cqlsh:dadcargate> COPY process (processuuid, processid, processnumber, processname, 
processstarttime, processendtime, processstatus, orderer, vorgangsnummer, vehicleid,
fin, reference, referencetype) 
FROM 'Process_BulkData.csv' WITH DELIMITER = ';' AND HEADER = TRUE;

It gives the following error:

Record #0 (line 1) has the wrong number of fields (15 instead of 13).
0 rows imported in 0.050 seconds.

Which is essentially true, As I do NOT have the timeUUID Fields in my cvs-export.

If I try the COPY command without explicit column-names like this (given the fact, that I actually do miss two fields):

cqlsh:dadcargate> COPY process from 'Process_BulkData.csv' 
WITH DELIMITER = ';' AND HEADER = TRUE;

I end up with another error:

Bad Request: Input length = 1
Aborting import at record #0 (line 1). Previously-inserted values still present.
0 rows imported in 0.009 seconds.

Hm. Kinda strange, but okay. Maybe the COPY command does not like the fact that there are two fields missing. I still think this to be strange, as the missing fields are of course there (from a structural point of view) but only empty.

I still have another shot: I deleted the missing columns in excel, exported the file again as cvs and try to import WITHOUT header line in my csv BUT explicit column names, like this:

cqlsh:dadcargate> COPY process (processuuid, processid, processnumber, processname, 
processstarttime, processendtime, processstatus, orderer, vorgangsnummer, vehicleid, 
fin, reference, referencetype) 
FROM 'Process_BulkData-2.csv' WITH DELIMITER = ';' AND HEADER = TRUE;

I get this error:

Bad Request: Input length = 1
Aborting import at record #0 (line 1). Previously-inserted values still present.
0 rows imported in 0.034 seconds.

Can ANYONE tell me what I'm doing wrong here? According to the documentation of copy-command, the way I setup my commands, should work for at least two of them. Or so I would think.

But nah, I'm obviously missing something important here.

muru
  • 4,723
  • 1
  • 34
  • 78
siliconchris
  • 613
  • 2
  • 9
  • 22
  • In your last attempt, you mention that you did not include a header line in your .csv file, but I clearly see `AND HEADER = TRUE` in your `COPY ` command. – Aaron Feb 21 '15 at 15:59
  • That is correct. I changed the file (no header now) and look, what happens: `Bad Request: Input length = 1 Aborting import at record #0 (line 1). Previously-inserted values still present.` – siliconchris Feb 21 '15 at 16:26

2 Answers2

16

cqlsh's COPY command can be touchy. However, in the COPY documentation is this line:

The number of columns in the CSV input is the same as the number of columns in the Cassandra table metadata.

Keeping that in-mind, I did manage to get your data to import with a COPY FROM, by naming the empty fields (processstarttimeuuid and processendtimeuuid, respectively):

aploetz@cqlsh:stackoverflow> COPY process (processuuid, processid, processnumber, 
processname, processstarttime, processstarttimeuuid, processendtime, 
processendtimeuuid, processstatus, orderer, vorgangsnummer, vehicleid, fin, reference, 
referencetype) FROM 'Process_BulkData.csv' WITH DELIMITER = ';' AND HEADER = TRUE;

1 rows imported in 0.018 seconds.
aploetz@cqlsh:stackoverflow> SELECT * FROM process ;

 processuuid                          | fin               | orderer | processendtime            | processendtimeuuid | processid         | processname        | processnumber | processstarttime          | processstarttimeuuid | processstatus | reference  | referencetype | vehicleid | vorgangsnummer
--------------------------------------+-------------------+---------+---------------------------+--------------------+-------------------+--------------------+---------------+---------------------------+----------------------+---------------+------------+---------------+-----------+----------------
 0f0d1498-d149-4fcc-87c9-f12783fdf769 | WAU2345CX67890876 |    SIXT | 2011-02-16 22:05:00+-0600 |               null | AbmeldungKl‰rfall | Abmeldung Kl‰rfall |             1 | 2011-02-02 22:05:00+-0600 |                 null |      Finished | KLA-BR4278 |      internal |    A-XA 1 |           4278

(1 rows)
Aaron
  • 55,518
  • 11
  • 116
  • 132
  • Hi Bryce, thanks for the hint. I did manage to get my data imported. Thanks a lot. – siliconchris Feb 22 '15 at 07:21
  • In case it helps someone, note that you have to use single quotes. If you use double quotes around the csv file you will get a improper copy command error. – morpheus Jan 09 '17 at 19:27
0

Loading csv file into cassandra table

step1)install cassandra loader using this url
sudo wget https://github.com/brianmhess/cassandra-loader/releases/download/v0.0.23/cassandra-loader

step2)sudo chmod +x cassandra-loader

a)csv file name is "pt_bms_tkt_success_record_details_new_2016_12_082017-01-0312-30-01.csv"

b)keyspace name is "bms_test"

c)Table name is "pt_bms_tkt_success_record_details_new"

d)columns are "trx_id......trx_day"

step3)csv file location and cassandra-loader is "cassandra3.7/bin/"

step$)[stp@ril-srv-sp3 bin]$ ./cassandra-loader -f pt_bms_tkt_success_record_details_new_2016_12_082017-01-0312-30-01.csv -host 192.168.1.29 -schema "bms_test.pt_bms_tkt_success_record_details_new(trx_id,max_seq,trx_type,trx_record_type,trx_date,trx_show_date,cinema_str_id,session_id,ttype_code,item_id,item_var_sequence,trx_booking_id,venue_name,screen_by_tnum,price_group_code,area_cat_str_code,area_by_tnum,venue_capacity,amount_currentprice,venue_class,trx_booking_status_committed,booking_status,amount_paymentstatus,event_application,venue_cinema_companyname,venue_cinema_name,venue_cinema_type,venue_cinema_application,region_str_code,venue_city_name,sub_region_str_code,sub_region_str_name,event_code,event_type,event_name,event_language,event_genre,event_censor_rating,event_release_date,event_producer_code,event_item_name,event_itemvariable_name,event_quantity,amount_amount,amount_bookingfee,amount_deliveryfee,amount_additionalcharges,amount_final,amount_tax,offer_isapplied,offer_type,offer_name,offer_amount,payment_lastmode,payment_lastamount,payment_reference1,payment_reference2,payment_bank,customer_loginid,customer_loginstring,offer_referral,customer_mailid,customer_mobile,trans_str_sales_status_at_venue,trans_mny_trans_value_at_venue,payment_ismypayment,click_recordsource,campaign,source,keyword,medium,venue_multiplex,venue_state,mobile_type,transaction_range,life_cyclestate_from,transactions_after_offer,is_premium_transaction,city_type,holiday_season,week_type,event_popularity,transactionrange_after_discount,showminusbooking,input_source_name,channel,time_stamp,life_cyclestate_to,record_status,week_name,number_of_active_customers,event_genre1,event_genre2,event_genre3,event_genre4,event_language1,event_language2,event_language3,event_language4,event_release_date_range,showminusbooking_range,reserve1,reserve2,reserve3,reserve4,reserve5,payment_mode,payment_type,date_of_first_transaction,transaction_time_in_hours,showtime_in_hours,trx_day)";

y durga prasad
  • 1,184
  • 8
  • 11