How do I get all rows where column has a value (exclude empty columns)?

Question

I have DB where not all columns in DB have values, but I need to get only those rows which have values. Example: My DB has 3 columns: "id", "first_name" and "last_name".

| id       | first_name | last_name |
| -------- | ---------- | --------- |
| 001      | Josh       | Irakly    |
| 002      | Carl       | Bruklin   |
| 003      |            | Drishlya  |
| 004      | Bick       |           |
| 005      |            | Mulan     |

I want to get all rows where "first_name" column has value, example below:

| id       | first_name | last_name |
| -------- | ---------- | --------- |
| 001      | Josh       | Irakly    |
| 002      | Carl       | Bruklin   |
| 004      | Bick       |           |

Does this answer your question? [SELECT in cassandra where id != null](https://stackoverflow.com/questions/24456188/select-in-cassandra-where-id-null) — Andrew, Jul 26 '23 at 18:36

score 0 · Answer 1 · answered Jul 26 '23 at 18:44

0

I figured out it, solution is:

SELECT *
FROM tableName
WHERE first_name > ''
ALLOW FILTERING

answered Jul 26 '23 at 18:44

Yanick

49
5

Note that this solution will skip entries which have a first_name set (i.e., not null) but set to an empty string. If you're fine with that, and I guess you are, then indeed it's a good solution. – Nadav Har'El Jul 30 '23 at 14:26
thank you, I understood. – Yanick Aug 03 '23 at 16:54

score 0 · Answer 2 · answered Jul 28 '23 at 00:56

0

In as much as you think you've found a "solution", it is not recommended to do this type of query.

Such an operation requires a full table scan where Cassandra has to read every single partition. It might work for very small datasets and/or very small clusters but it doesn't scale.

Imagine the scenario where the table contains millions or billions of records distributed across a large number of nodes -- the query would just time out.

If you need to perform this query, it indicates that you made a mistake modelling your data. The general recommendation is to design a table for each application query so the reads are optimised for the app. Cheers!

answered Jul 28 '23 at 00:56

Erick Ramirez

13,964
1
18
23

Thank you for your response, may I ask, what do you think, "LIMIT 10" will help it in such situations as you exampled? I don''t need all records from DB, I need some of them, like 50, 100 records no more. – Yanick Jul 29 '23 at 20:24
Actually, a full-table scan can work very efficiently if the filtered-out data isn't the majority of the data. E.g., imagine that you have 1 billion entries, but only 0.1 billion are missing a name. The scan of the entries with names will obviously involve reading a huge amount of entries, but that was the intent of this scan. You can also parallelize the full-tab;e scan by scanning different token ranges in parallel (see this blog post: https://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/) – Nadav Har'El Jul 30 '23 at 14:29

How do I get all rows where column has a value (exclude empty columns)?

2 Answers2