2

I am trying to query my cassandra database to return data from a list of names held on an array server side. This is held as an array. I know the data I am accessing is stored as a string in my database and so I have appended single quotes around it (I have tried with and without this but no luck).

Here is my query.

const arr = ["ukcust1","ukcust2","ukcust5"];
//Here I append single quotes before and after to each string if needed

const query = "SELECT * FROM table_name WHERE name = ?";
client.execute(query, arr, { prepare:true }, function (err, result) {
..//Code
};

What am I missing here? I want the query to be:

SELECT * FROM table_name WHERE name = each of the names in the array 'arr';
Joseph Clough
  • 33
  • 1
  • 6
  • 1
    What it's your table structure? If you want to provide a list you need to change your query condition, something like "where name in ()". But if you do this, you might start to query on several partitions which it's not the best. I recommend this to read https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/ and this http://batey.info/cassandra-anti-pattern-distributed.html – Horia May 14 '18 at 13:28
  • The table has multiple columns but this column stores data as a string. It runs on one server and one key space (I have no control over the database, I just query it). I can succesfully search using one for example: const query = "SELECT * FROM column WHERE name = 'ukcust1' ALLOW FILTERING"; – Joseph Clough May 14 '18 at 13:39
  • In C*, tables are designed to serve certain queries. Since you need to use IN/ALLOW FILTERING, you should revise you data model. The name column is not the partition key. It's pretty important to query on the partition key: it would lower the response times and the load on nodes (because of how C* stores data on disk/nodes). ALLOW FILTERING allows to skip the partition key, but it's not recommended since it would result in very expensive queries, across multiple partitions. Using IN might result in slower performance on queries since it could correspond to non-contiguous areas in the row. – Horia May 14 '18 at 13:55
  • I'm not sure this answers or relates to my question, forgive me if I'm wrong. I am asking how to do a query with multiple params. I've revised my question to try make more sense. – Joseph Clough May 14 '18 at 15:21
  • If you want something like SELECT * FROM table_name WHERE name = 'ukcust2', then the name column has to be a partition key. Or, if you have some other partition key, then you must query after it and the name has to be a clustering column. This is why I asked in the begging what it's you table structure. In C*, queries are following a certain pattern: select from where = and .You could also use secondary indexes, but I'm not sure if they are recommended for production use. – Horia May 14 '18 at 18:26

2 Answers2

3

If name were a clustering key, then you could query with "in" and "allow filtering" like this:

select * from table_name where name in ('ukcust1','ukcust2','ukcust3') allow filtering

Assuming name is not a clustering key, you could use a clustering key (e.g., date_of_birth) if it made logical sense -- that is, if filtering by date made sense in relation to the name -- like this:

select * from table_name where date_of_birth in (1969, 1972) name in ('ukcust1','ukcust2','ukcust3') allow filtering

If you can't do either of those things, you will need to loop through the array with Javascript (e.g., foreach).

Valerie Parham-Thompson
  • 1,516
  • 1
  • 11
  • 21
  • 1
    Thanks for the clear answer explaining clustering keys. I thought that I would have to eventually loop it. – Joseph Clough May 14 '18 at 21:58
  • From a developer perspective, probably making the query to work it's good. But from C* perspective, using ALLOW FILTERING might not be the best solution. I exposed the reasons above. – Horia May 15 '18 at 07:42
  • Agreed, but he also explained the reasons he couldn't do it the "right" way. – Valerie Parham-Thompson May 15 '18 at 10:26
  • When querying C* you need to be aware of your data model. I really don't think that "hiding" behind "I just query it" should absolve the developer on doing the right queries. He should be aware that his "right" query, even if it works and returns some results in a test environment, could get him into real problems on real data. Problems that would impact the whole cluster. With my comments I tried to make him aware that it's not important just to make it work. – Horia May 15 '18 at 10:51
  • Your comments are totally fair. This is, unfortunately, not an ideal situation. – Joseph Clough May 15 '18 at 11:58
0

The correct input of the query parameters is an array of values. In this case, it would be an array of parameters containing a single item, that is an array of names.

const arr = ["ukcust1","ukcust2","ukcust5"];
const query = "SELECT * FROM table_name WHERE name = ?";

// Note the array containing a single item
const parameters = [ arr ];

client.execute(query, parameters, { prepare: true }, callback);

See more info in the documentation: https://docs.datastax.com/en/developer/nodejs-driver/3.5/faq/#how-can-i-use-a-list-of-values-with-the-in-operator-in-a-where-clause

jorgebg
  • 6,560
  • 1
  • 22
  • 31