-1

I need to implement plenty of rules on the data that I'll receive on daily basis.

Data will have information about the user actions like someone clicking on an advertisement. We want to ignore some of the clicks based on rules like

- anyone clicking the same ad more than 4 times in a minute --> ignore all clicks 4th onwards
- anyone clicking the same ad more than 4 times in an hour --> ignore all clicks 4th onwards
- anyone clicking different ads more than 10 times in a minute --> ignore all clicks for that user

Data will be coming for each clicking. Example:

User_ID AD_ID  CLICK_TIME
User1   ad1    2018-09-11 11:10:00
User1   ad1    2018-09-11 11:10:01
User1   ad1    2018-09-11 11:10:02
User1   ad1    2018-09-11 11:10:03
User1   ad1    2018-09-11 11:10:04
User1   ad1    2018-09-11 11:10:05

Since the data will be huge and each rule requires data aggregation and then checking the counts. Data will be provided in a file.

May I know whats the best approach to implement such rules in Java? Is there any ope source that we can use?

Thanks

rupesh
  • 413
  • 9
  • 19

1 Answers1

0

It depends on the velocity of data flowing in and other factors as described in What is Big Data?

Since you just need the last few hours' data in memory at the most, I would suggest you to look at Apache Spark. If the data is much larger and computation doesn't need to be realtime, you can also look at Hadoop. Both Spark and Hadoop work well with files.

You can also stream the data and use Kafka Streams to perform all those manipulations.

Read more about Big Data and you feel that your data is not so "big", and you could also use a database, I would suggest you to keep things simple and read the last 'x' hours' data from database and do your computation.

As for the Java design pattern for your click validations, you can look at Chain of Responsibility pattern.

PS:- I am not an architect, you may want to look at other answers. This answer is just to provide you some guidance on which technologies are available.

Kartik
  • 7,677
  • 4
  • 28
  • 50