Queries on Beeline and Spark-SQL

Asked Jun 22 '17 at 09:59

Active Sep 15 '17 at 23:12

Viewed 192 times

I'm new to using beeline run queries but why does it take so much time to run a simple query of SELECT count(*) from table1 [having data of 7,000,000 records] ?

And when do we use beeline instead of Spark-SQL and vice-versa?

Thanks

edited Sep 22 '17 at 17:48

Community

asked Jun 22 '17 at 09:59

Ehiz Ize

"Select count(*)" isn't suited for Cassandra queries. Under the hood you are requesting count of data spread across all the Cassandra nodes in the cluster. So its a bad query to Cassandra. – dilsingi Jun 22 '17 at 17:47
Your trying to read everything from every range in the cluster, and send it to the coordinator to merge together and count https://stackoverflow.com/questions/29394382/operation-time-out-error-in-cqlsh-console-of-cassandra/29394935#29394935 – Chris Lohfink Jun 22 '17 at 19:24
Thanks for the inputs, i'm beginning to understand now – Ehiz Ize Jul 05 '17 at 08:06

Queries on Beeline and Spark-SQL

0 Answers0