Map type read latency

Question

First of all thanks to everyone here who answers all our queries.

I am stucked in one of the points I hope people can get me out of my problem.

I have Apache 2.1 Cluster with 6 nodes and I have created a table with 3 columns..1st column as text type and other 2 are map type. When I insert data into table and read the data..for fetching 1 row it is taking around 20 milliseconds but if I create a table with text type for all the 3 columns then it is taking just 5 ms. Kindly suggest me were I am missing..why is it taking time if it is map type?? I am confused were to start the map type read latency.

Below are the cfstats and query:

Table:

PRODUCT_TYPE
SSTable count: 1
SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
Space used (live): 81458
Space used (total): 81458
Space used by snapshots (total): 0
Off heap memory used (total): 87
SSTable Compression Ratio: 0.15090414689301526
Number of keys (estimate): 6
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 5
Local read latency: 22.494 ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 16
Bloom filter off heap memory used: 8
Index summary off heap memory used: 15
Compression metadata off heap memory used: 64
Compacted partition minimum bytes: 73458
Compacted partition maximum bytes: 105778
Compacted partition mean bytes: 91087
Average live cells per slice (last five minutes): 1.0
Maximum live cells per slice (last five minutes): 1.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0

CREATE TABLE TEST.PRODUCT_TYPE (
type text PRIMARY KEY,
col1 map<int, boolean>,
timestamp_map map<int, timestamp>
) WITH bloom_filter_fp_chance = 0.1
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':                       'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';


 activity                                                                                                              | timestamp                  | source        | source_elapsed
 -----------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------
                                                                                                    Execute CQL3 query | 2015-06-03 21:57:36.841000 | 10.65.133.202 |              0
                                            Parsing SELECT * from location_eligibility_by_type5; [SharedPool-Worker-1] | 2015-06-03 21:57:36.842000 | 10.65.133.202 |             54
                                                                             Preparing statement [SharedPool-Worker-1] | 2015-06-03 21:57:36.842000 | 10.65.133.202 |             86
                                                                       Computing ranges to query [SharedPool-Worker-1] | 2015-06-03 21:57:36.842000 | 10.65.133.202 |            165
  Submitting range requests on 1537 ranges with a concurrency of 1 (0.0 rows per range expected) [SharedPool-Worker-1] | 2015-06-03 21:57:36.842000 | 10.65.133.202 |            410
                                                             Enqueuing request to /10.65.137.191 [SharedPool-Worker-1] | 2015-06-03 21:57:36.849000 | 10.65.133.202 |           7448
                                                                      Message received from /10.65.133.202 [Thread-15] | 2015-06-03 21:57:36.849000 | 10.65.137.191 |             15
                                      Submitted 1 concurrent range requests covering 1537 ranges [SharedPool-Worker-1] | 2015-06-03 21:57:36.849000 | 10.65.133.202 |           7488
                                                              Sending message to /10.65.137.191 [WRITE-/10.65.137.191] | 2015-06-03 21:57:36.849000 | 10.65.133.202 |           7515
 Executing seq scan across 0 sstables for [min(-9223372036854775808), min(-9223372036854775808)] [SharedPool-Worker-1] | 2015-06-03 21:57:36.850000 | 10.65.137.191 |            105
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.866000 | 10.65.137.191 |          16851
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.882000 | 10.65.137.191 |          33542
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.899000 | 10.65.137.191 |          50206
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.915000 | 10.65.137.191 |          66556
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.932000 | 10.65.137.191 |          82814
                                                                    Scanned 5 rows and matched 5 [SharedPool-Worker-1] | 2015-06-03 21:57:36.932000 | 10.65.137.191 |          82839
                                                            Enqueuing response to /10.65.133.202 [SharedPool-Worker-1] | 2015-06-03 21:57:36.933000 | 10.65.137.191 |          82878
                                                              Sending message to /10.65.133.202 [WRITE-/10.65.133.202] | 2015-06-03 21:57:36.933000 | 10.65.137.191 |          83054
                                                                     Message received from /10.65.137.191 [Thread-151] | 2015-06-03 21:57:36.944000 | 10.65.133.202 |         102134
                                                         Processing response from /10.65.137.191 [SharedPool-Worker-2] | 2015-06-03 21:57:36.944000 | 10.65.133.202 |         102191
                                                                                                      Request complete | 2015-06-03 21:57:36.948916 | 10.65.133.202 |         107916

Thanks in advance for all your support and answers.

Thanks, John

score 3 · Answer 1 · answered Jun 03 '15 at 08:32

Collection types in Cassandra under the hood are implemented as a blobs, no real magic here.

To measure the difference you can enable tracing in C* and see the difference by yourself:

create table no_collections(id int, value text, primary key (id));
create table with_collections(id int, value set<text>, primary key (id));

cqlsh:stackoverflow> select * from no_collections ;

 id | value
----+-------------
  1 | foo,bar,baz
  2 | xxx,yyy,zzz
  3 | aaa,bbb,ccc

(3 rows)
cqlsh:stackoverflow> select * from with_collections ;

 id | value
----+-----------------------
  1 | {'bar', 'baz', 'foo'}
  2 | {'xxx', 'yyy', 'zzz'}
  3 | {'aaa', 'bbb', 'ccc'}

(3 rows)

Now let's enable tracing to see what's going on:

cqlsh:stackoverflow> TRACING ON ;
Now Tracing is enabled
cqlsh:stackoverflow> select * from with_collections where id=3;

 id | value
----+-----------------------
  3 | {'aaa', 'bbb', 'ccc'}

(1 rows)

Tracing session: 7c3d4ed0-09c8-11e5-b4cd-2988e70b20cb

activity                                                                                            | timestamp                  | source    | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                              Execute CQL3 query | 2015-06-03 11:13:58.717000 | 127.0.0.1 |              0
                        Parsing select * from with_collections where id=3; [SharedPool-Worker-1] | 2015-06-03 11:13:58.718000 | 127.0.0.1 |             72
                                                       Preparing statement [SharedPool-Worker-1] | 2015-06-03 11:13:58.718000 | 127.0.0.1 |            218
                      Executing single-partition query on with_collections [SharedPool-Worker-3] | 2015-06-03 11:13:58.718000 | 127.0.0.1 |            547
                                              Acquiring sstable references [SharedPool-Worker-3] | 2015-06-03 11:13:58.718000 | 127.0.0.1 |            556
                                               Merging memtable tombstones [SharedPool-Worker-3] | 2015-06-03 11:13:58.719000 | 127.0.0.1 |            574
 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-06-03 11:13:58.719000 | 127.0.0.1 |            636
                                Merging data from memtables and 0 sstables [SharedPool-Worker-3] | 2015-06-03 11:13:58.719000 | 127.0.0.1 |            644
                                        Read 1 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-06-03 11:13:58.719000 | 127.0.0.1 |            673
                                                                                Request complete | 2015-06-03 11:13:58.717847 | 127.0.0.1 |            847

As you see, it took only ~800ns to parse and execute a query which uses collection. Without collections the situation looks mostly the same:

cqlsh:stackoverflow> select * from no_collections where id=3;

 id | value
----+-------------
  3 | aaa,bbb,ccc

(1 rows)

Tracing session: 7e9ac6d0-09c8-11e5-b4cd-2988e70b20cb

 activity                                                                                        | timestamp                  | source    | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                              Execute CQL3 query | 2015-06-03 11:14:02.685000 | 127.0.0.1 |              0
                          Parsing select * from no_collections where id=3; [SharedPool-Worker-1] | 2015-06-03 11:14:02.686000 | 127.0.0.1 |             77
                                                       Preparing statement [SharedPool-Worker-1] | 2015-06-03 11:14:02.686000 | 127.0.0.1 |            209
                        Executing single-partition query on no_collections [SharedPool-Worker-3] | 2015-06-03 11:14:02.686000 | 127.0.0.1 |            525
                                              Acquiring sstable references [SharedPool-Worker-3] | 2015-06-03 11:14:02.686000 | 127.0.0.1 |            534
                                               Merging memtable tombstones [SharedPool-Worker-3] | 2015-06-03 11:14:02.687000 | 127.0.0.1 |            553
 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-06-03 11:14:02.687000 | 127.0.0.1 |            598
                                Merging data from memtables and 0 sstables [SharedPool-Worker-3] | 2015-06-03 11:14:02.688000 | 127.0.0.1 |            606
                                        Read 1 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-06-03 11:14:02.688000 | 127.0.0.1 |            630
                                                                                Request complete | 2015-06-03 11:14:02.685789 | 127.0.0.1 |            789

So I see no real difference here.

The timing shown by cqlsh tracing is approximate and not really statistically correct. To explore the difference you need to run at least a few dozens experiments and then compare it's results. But the results may depend of different things:

network latency between nodes. It can be the cause of all latency problems in shared infrastructure like AWS.
cluster load. If your cluster is not idle, it may do some background work which may interfere your measurements.
background jobs. If you have a dataset with frequent updates/deletes, C* may be doing some compaction tasks under the hood, and it also can interfere with other queries.
heavy updates/deletes, low-memory. If you have (or had in the past) heavy update/delete workload, your data may be spread among multiple small SSTables that were not yet compacted. C* must read most of them for your row, which will lead to high query latency.

So I suggest you to run your queries with tracing enabled to see the problem, but I bet that it isn't connected with collections at all.

thanks for the reply. I updated with few cfstats report could you see and tell me if there is anything wrong happening over there which is making read latency around 22ms. Thanks. — john cena, Jun 03 '15 at 17:30
I do not see anything suspicious in your cfstats. And I again suggest you to run your queries with tracing enabled to see what's going on inside. — shutty, Jun 04 '15 at 08:15
i updated the query stats after tracing on. could you tell me if see anything wrong? thank you very much. — john cena, Jun 04 '15 at 20:51

Map type read latency

1 Answers1