How to join multiple data sources in Cassandra

Question

I'm working for the first time with Cassandra and I have some doubts. My data sources are csv files. I have three: flights, airplane and airport. I will put the structure of each csv file to contextualize my problem.

Airport

Airplane

Flights

The Flights file is the main and has millions of records. The other two are supplemental data.

According to what I read about Cassandra, first should be defined the necessary queries and then created column families that meet our needs. However Cassandra not support JOIN's. How can I relate data that is in a csv file with another in order to create a column family with different csv file fields?

For example, if I want to know which airplane model registers more delays in flights. In the relational model this is possible doing JOIN's but in Cassandra I think it's impossible.

There is any way to do this in Cassadra? How I can have a column family with different csv file fields?

You're right. Cassandra does not support JOINs. Therefore in the case you've described, if you know that this will be a very common query, you can either add the airplane model into the flight information as well (duplicated data is OK with NoSQL databases), or run several queries to get the information you need (thus, essentially, doing the JOIN on the client side) — uri2x, Aug 28 '15 at 12:06
The solution will gather all the information in the flights file. Perhaps a good tool for doing this is the Talend Open Studio for Big Data.... — Pedro Cunha, Aug 28 '15 at 13:51

How to join multiple data sources in Cassandra

0 Answers0