0

I have a rpc_timeout on a insert of one single row (the cluster is with 3 nodes, and can handle more than 10000 insert / min on another table)

Here is the table :

CREATE TABLE test_table (
    agent text,
    run_id text,
    process_id text,
    datetime timestamp,
    tracking_time timestamp,
    email text,
    ip text,
    event_id uuid,
    event_name text,
    message_id text,
    source text,
    url text,
    test_table text,
PRIMARY KEY ((process_id, event_name), event_id));

CREATE INDEX test_table_process_id ON test_table (process_id);

and here is the insert :

BEGIN BATCH
INSERT INTO test_table (message_id, run_id, event_id, ip, process_id, agent, datetime, event_name, url, test_table, email, tracking_time) VALUES ('exampleaaaaaaaaaaaaaaaaaaaaaaaaa', 'bar', 376d8e20-35ca-4615-8e9f-f0b5b4431981, 'None', 'test-dummy', 'None', '2014-08-31 17:20:24', 'hard_bounce', 'None', 'mandrill', 'example.webhook@mandrillapp.com', '2014-09-01T18:04:40');
APPLY BATCH;

I don't know if the timeout is due to the secondary index.

Nothing about any error in system.log

tahir
  • 1,016
  • 1
  • 11
  • 21

1 Answers1

0

probably hitting timeout from batch, the 2i are not much overhead but transactions are. Try using unlogged batch instead.

BEGIN UNLOGGED
INSERT INTO test_table (message_id, run_id, event_id, ip, process_id, agent, datetime, event_name, url, test_table, email, tracking_time) VALUES ('exampleaaaaaaaaaaaaaaaaaaaaaaaaa', 'bar', 376d8e20-35ca-4615-8e9f-f0b5b4431981, 'None', 'test-dummy', 'None', '2014-08-31 17:20:24', 'hard_bounce', 'None', 'mandrill', 'example.webhook@mandrillapp.com', '2014-09-01T18:04:40');
APPLY BATCH;

Unless there is a good case for it, use individual writes over batching - its more performant.

Chris Lohfink
  • 16,150
  • 1
  • 29
  • 38
  • actually I've change the table definition : PRIMARY KEY ((process_id), event_id)); and removed the secondary index, I can query by process_id, and the insert seems far more efficient, same for the select where process_id = ... – tahir Sep 01 '14 at 22:27
  • batch would be for when you have multiple inserts you want as a single atomic operation. Its significantly expensive so only use it when absolutely needed. Secondary index on something with high cardinality would be very expensive and may just of been pushing you over your clusters capabilities then. Custom indexes will always be better. – Chris Lohfink Sep 02 '14 at 00:56