1

I would like to design a system that

  • Will be reading CDR (call data records) files and inserts them into a nosql database. To achieve this spark streaming with Cassandra as nosql looks promising as the files will keep coming
  • Will be able to calculate real time price by rating the duration and called number or just kilobytes in case of data and store the total so far chargable amount for the current billcycle. I need a nosql that i will be both inserting rated cdrs and updating total so far chargable amount for the current billcycle for that msisdn in that cdr.
  • In case rate plans are updated for a specific subscription, for the current billcycle all the cdrs using that price plan needs to be recalculated and total so far amount needs to be calculated for all the customers

Notes:

  • Msisdns are unique for each subscription with one to one relation. Within a month One msisdn can have up to 100000 cdrs.
  • I have been going through the nosql databases so far i am thinking to use cassandra but i am still not sure how to design the database to optimize for this business case.
  • Please also consider while one cdr is being processed in one node, the other cdr for the same msisdn can be processed in another node at the same time and both nodes doing the above logic.
Community
  • 1
  • 1
fatih tekin
  • 959
  • 10
  • 21
  • 2
    As a matter of advice, your question is quite broad and ought to be closed. Please read [this post](http://stackoverflow.com/help/how-to-ask) to help your write your question better! – eliasah Nov 15 '15 at 22:07

1 Answers1

3

The question is indeed very broad - StackOverflow is a meant to cover more specific technical questions and not debate architectural aspects of an entire system.

Apart from this, let me attempt to address some of the aspects of your questions:

a) Using streaming for CDR processing:

Spark Streaming is indeed a tool of choice for incoming CDRs, typically delivered over a message queueing system such as Kafka. It allows windowed operations, whcih com in handy when you need to calculate call charges over a set period (hours, days etc..). You can very easily combine existing static records, such as price plans from other databases, with your incoming CDRs in windowed operations. All of that in a robust and expansive API.

b) using Cassandra as a store

Cassandra has excellent scaling capabilities with instantaneous row access - for that, it's an absolute killer. However, in the case of TelCo industry setting, I would seriously question using it for anything else than MSISDN lookups and credit checks. Cassandra is essentially a columnar KV storage, and trying to store multi dimensional, essentially relational records such as price plans, contracts and the lot will give you lots of headaches. I would suggest storing your data in different stores, depending on the use cases. These could be:

  • CDR raw records in HDFS -> CDRs can be plentiful, and if you need to reprocess them, collecting them from HDFS will be more efficient
  • Bill summaries in Cassandra -> the itemized bill summaries is the result of CDR as initially processed by Spark Streaming. These are essentially columnar and can be perfectly stored in Cassandra
  • MSISDN and Credit information -> as mentioned above, that is also a perfect use case for Cassandra
  • price plans -> these are multi dimensional, more document oriented, and should be stored in databases that support such structures. You can perfectly use Postgres with JSON for that, as you you wouldn't expect more than a handful of plans.

To conclude this, you're actually looking at a classic lambda use case with Spark Streaming for immediate processing of incoming CDRs, and batch processing with regular Spark on HDFS for post processing, for instance when you're recalculating CDR costs after plan changes.

Erik Schmiegelow
  • 2,739
  • 1
  • 18
  • 22