0

I'm working for the first time with Cassandra and I have some doubts. My data sources are csv files. I have three: flights, airplane and airport. I will put the structure of each csv file to contextualize my problem.

Airport

ID_airport | airport | city | state | country | latitude | longitude

Airplane

ID_airplane |type |manufacturer |issue_date |model |engine_type |aircraft_type

Flights

ID_flight |date |Flight_Numb |ID_airplane |ID_airport_origin |ID_airport_dest

DepartureTime |Arrival_time |airline |distance |DepDelay |ArrivalDelay.

The Flights file is the main and has millions of records. The other two are supplemental data.

According to what I read about Cassandra, first should be defined the necessary queries and then created column families that meet our needs. However Cassandra not support JOIN's. How can I relate data that is in a csv file with another in order to create a column family with different csv file fields?

For example, if I want to know which airplane model registers more delays in flights. In the relational model this is possible doing JOIN's but in Cassandra I think it's impossible.

There is any way to do this in Cassadra? How I can have a column family with different csv file fields?

Pedro Cunha
  • 401
  • 1
  • 6
  • 16
  • You're right. Cassandra does not support JOINs. Therefore in the case you've described, if you know that this will be a very common query, you can either add the airplane model into the flight information as well (duplicated data is OK with NoSQL databases), or run several queries to get the information you need (thus, essentially, doing the JOIN on the client side) – uri2x Aug 28 '15 at 12:06
  • The solution will gather all the information in the flights file. Perhaps a good tool for doing this is the Talend Open Studio for Big Data.... – Pedro Cunha Aug 28 '15 at 13:51

0 Answers0