2

First of all thanks to everyone here who answers all our queries.

I am stucked in one of the points I hope people can get me out of my problem.

I have Apache 2.1 Cluster with 6 nodes and I have created a table with 3 columns..1st column as text type and other 2 are map type. When I insert data into table and read the data..for fetching 1 row it is taking around 20 milliseconds but if I create a table with text type for all the 3 columns then it is taking just 5 ms. Kindly suggest me were I am missing..why is it taking time if it is map type?? I am confused were to start the map type read latency.

Below are the cfstats and query:

Table:

PRODUCT_TYPE
SSTable count: 1
SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
Space used (live): 81458
Space used (total): 81458
Space used by snapshots (total): 0
Off heap memory used (total): 87
SSTable Compression Ratio: 0.15090414689301526
Number of keys (estimate): 6
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 5
Local read latency: 22.494 ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 16
Bloom filter off heap memory used: 8
Index summary off heap memory used: 15
Compression metadata off heap memory used: 64
Compacted partition minimum bytes: 73458
Compacted partition maximum bytes: 105778
Compacted partition mean bytes: 91087
Average live cells per slice (last five minutes): 1.0
Maximum live cells per slice (last five minutes): 1.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0

CREATE TABLE TEST.PRODUCT_TYPE (
type text PRIMARY KEY,
col1 map<int, boolean>,
timestamp_map map<int, timestamp>
) WITH bloom_filter_fp_chance = 0.1
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':                       'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';


 activity                                                                                                              | timestamp                  | source        | source_elapsed
 -----------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------
                                                                                                    Execute CQL3 query | 2015-06-03 21:57:36.841000 | 10.65.133.202 |              0
                                            Parsing SELECT * from location_eligibility_by_type5; [SharedPool-Worker-1] | 2015-06-03 21:57:36.842000 | 10.65.133.202 |             54
                                                                             Preparing statement [SharedPool-Worker-1] | 2015-06-03 21:57:36.842000 | 10.65.133.202 |             86
                                                                       Computing ranges to query [SharedPool-Worker-1] | 2015-06-03 21:57:36.842000 | 10.65.133.202 |            165
  Submitting range requests on 1537 ranges with a concurrency of 1 (0.0 rows per range expected) [SharedPool-Worker-1] | 2015-06-03 21:57:36.842000 | 10.65.133.202 |            410
                                                             Enqueuing request to /10.65.137.191 [SharedPool-Worker-1] | 2015-06-03 21:57:36.849000 | 10.65.133.202 |           7448
                                                                      Message received from /10.65.133.202 [Thread-15] | 2015-06-03 21:57:36.849000 | 10.65.137.191 |             15
                                      Submitted 1 concurrent range requests covering 1537 ranges [SharedPool-Worker-1] | 2015-06-03 21:57:36.849000 | 10.65.133.202 |           7488
                                                              Sending message to /10.65.137.191 [WRITE-/10.65.137.191] | 2015-06-03 21:57:36.849000 | 10.65.133.202 |           7515
 Executing seq scan across 0 sstables for [min(-9223372036854775808), min(-9223372036854775808)] [SharedPool-Worker-1] | 2015-06-03 21:57:36.850000 | 10.65.137.191 |            105
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.866000 | 10.65.137.191 |          16851
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.882000 | 10.65.137.191 |          33542
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.899000 | 10.65.137.191 |          50206
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.915000 | 10.65.137.191 |          66556
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.932000 | 10.65.137.191 |          82814
                                                                    Scanned 5 rows and matched 5 [SharedPool-Worker-1] | 2015-06-03 21:57:36.932000 | 10.65.137.191 |          82839
                                                            Enqueuing response to /10.65.133.202 [SharedPool-Worker-1] | 2015-06-03 21:57:36.933000 | 10.65.137.191 |          82878
                                                              Sending message to /10.65.133.202 [WRITE-/10.65.133.202] | 2015-06-03 21:57:36.933000 | 10.65.137.191 |          83054
                                                                     Message received from /10.65.137.191 [Thread-151] | 2015-06-03 21:57:36.944000 | 10.65.133.202 |         102134
                                                         Processing response from /10.65.137.191 [SharedPool-Worker-2] | 2015-06-03 21:57:36.944000 | 10.65.133.202 |         102191
                                                                                                      Request complete | 2015-06-03 21:57:36.948916 | 10.65.133.202 |         107916

Thanks in advance for all your support and answers.

Thanks, John

john cena
  • 41
  • 4

1 Answers1

3

Collection types in Cassandra under the hood are implemented as a blobs, no real magic here.

To measure the difference you can enable tracing in C* and see the difference by yourself:

create table no_collections(id int, value text, primary key (id));
create table with_collections(id int, value set<text>, primary key (id));

cqlsh:stackoverflow> select * from no_collections ;

 id | value
----+-------------
  1 | foo,bar,baz
  2 | xxx,yyy,zzz
  3 | aaa,bbb,ccc

(3 rows)
cqlsh:stackoverflow> select * from with_collections ;

 id | value
----+-----------------------
  1 | {'bar', 'baz', 'foo'}
  2 | {'xxx', 'yyy', 'zzz'}
  3 | {'aaa', 'bbb', 'ccc'}

(3 rows)

Now let's enable tracing to see what's going on:

cqlsh:stackoverflow> TRACING ON ;
Now Tracing is enabled
cqlsh:stackoverflow> select * from with_collections where id=3;

 id | value
----+-----------------------
  3 | {'aaa', 'bbb', 'ccc'}

(1 rows)

Tracing session: 7c3d4ed0-09c8-11e5-b4cd-2988e70b20cb

activity                                                                                            | timestamp                  | source    | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                              Execute CQL3 query | 2015-06-03 11:13:58.717000 | 127.0.0.1 |              0
                        Parsing select * from with_collections where id=3; [SharedPool-Worker-1] | 2015-06-03 11:13:58.718000 | 127.0.0.1 |             72
                                                       Preparing statement [SharedPool-Worker-1] | 2015-06-03 11:13:58.718000 | 127.0.0.1 |            218
                      Executing single-partition query on with_collections [SharedPool-Worker-3] | 2015-06-03 11:13:58.718000 | 127.0.0.1 |            547
                                              Acquiring sstable references [SharedPool-Worker-3] | 2015-06-03 11:13:58.718000 | 127.0.0.1 |            556
                                               Merging memtable tombstones [SharedPool-Worker-3] | 2015-06-03 11:13:58.719000 | 127.0.0.1 |            574
 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-06-03 11:13:58.719000 | 127.0.0.1 |            636
                                Merging data from memtables and 0 sstables [SharedPool-Worker-3] | 2015-06-03 11:13:58.719000 | 127.0.0.1 |            644
                                        Read 1 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-06-03 11:13:58.719000 | 127.0.0.1 |            673
                                                                                Request complete | 2015-06-03 11:13:58.717847 | 127.0.0.1 |            847

As you see, it took only ~800ns to parse and execute a query which uses collection. Without collections the situation looks mostly the same:

cqlsh:stackoverflow> select * from no_collections where id=3;

 id | value
----+-------------
  3 | aaa,bbb,ccc

(1 rows)

Tracing session: 7e9ac6d0-09c8-11e5-b4cd-2988e70b20cb

 activity                                                                                        | timestamp                  | source    | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                              Execute CQL3 query | 2015-06-03 11:14:02.685000 | 127.0.0.1 |              0
                          Parsing select * from no_collections where id=3; [SharedPool-Worker-1] | 2015-06-03 11:14:02.686000 | 127.0.0.1 |             77
                                                       Preparing statement [SharedPool-Worker-1] | 2015-06-03 11:14:02.686000 | 127.0.0.1 |            209
                        Executing single-partition query on no_collections [SharedPool-Worker-3] | 2015-06-03 11:14:02.686000 | 127.0.0.1 |            525
                                              Acquiring sstable references [SharedPool-Worker-3] | 2015-06-03 11:14:02.686000 | 127.0.0.1 |            534
                                               Merging memtable tombstones [SharedPool-Worker-3] | 2015-06-03 11:14:02.687000 | 127.0.0.1 |            553
 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-06-03 11:14:02.687000 | 127.0.0.1 |            598
                                Merging data from memtables and 0 sstables [SharedPool-Worker-3] | 2015-06-03 11:14:02.688000 | 127.0.0.1 |            606
                                        Read 1 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-06-03 11:14:02.688000 | 127.0.0.1 |            630
                                                                                Request complete | 2015-06-03 11:14:02.685789 | 127.0.0.1 |            789

So I see no real difference here.

The timing shown by cqlsh tracing is approximate and not really statistically correct. To explore the difference you need to run at least a few dozens experiments and then compare it's results. But the results may depend of different things:

  • network latency between nodes. It can be the cause of all latency problems in shared infrastructure like AWS.
  • cluster load. If your cluster is not idle, it may do some background work which may interfere your measurements.
  • background jobs. If you have a dataset with frequent updates/deletes, C* may be doing some compaction tasks under the hood, and it also can interfere with other queries.
  • heavy updates/deletes, low-memory. If you have (or had in the past) heavy update/delete workload, your data may be spread among multiple small SSTables that were not yet compacted. C* must read most of them for your row, which will lead to high query latency.

So I suggest you to run your queries with tracing enabled to see the problem, but I bet that it isn't connected with collections at all.

shutty
  • 3,298
  • 16
  • 27
  • thanks for the reply. I updated with few cfstats report could you see and tell me if there is anything wrong happening over there which is making read latency around 22ms. Thanks. – john cena Jun 03 '15 at 17:30
  • I do not see anything suspicious in your cfstats. And I again suggest you to run your queries with tracing enabled to see what's going on inside. – shutty Jun 04 '15 at 08:15
  • i updated the query stats after tracing on. could you tell me if see anything wrong? thank you very much. – john cena Jun 04 '15 at 20:51