It's actually an interview question I'm thinking of for 2 month and can't find a suitable architecture.
The problem
We want to build a small analytics system for fraud detection on orders.
System has the following requirements
- Not allowed to use any technology from the market (
MySql
,Redis
,Hadoop
,S3
etc) - Needs to scale as the data volume grows
- Just a bunch of machines, with disks and decent amount of memory
- 10M Writes/Day
The system needs to provide following API
/insertOrder(order): Order
Add an order to the storage. The order can be considered blob with 1-10KBs in size, with anorderId
,beginTime
, andfinishTime
as distinguished fields/getLongestNOrdersByDuration(n: int, startTime: datetime, endTime: datetime): Order[]
Retrieve the longest N orders that started betweenstartTime
andendTime
,
as measured by durationfinishTime - beginTime
/getShortestNOrdersByDuration(n: int, startTime: datetime, endTime: datetime): Order[]
Retrieve the shortest N orders that started betweenstartTime
andendTime
,
as measured by durationfinishTime - beginTime