Asked this on cs.stackexchange.. got a downvote.. because i was not very clear.. So I will try to be more specific here..
Q - design a datastructure to return the number of connections to a web server in last 1 minute..
Assumptions -
- The server has a heavy incoming number of connection.. like Indian railways reservation OR a social nw site etc..
- Suppose if this is a big data problem .. then I have the infra to run a big data job ..
I am looking for:
Efficiency - Is it possible to do this in O(1)? If, say, we do it in O(n).. the problem is that if it takes N milliseconds to compute the answer.. there are some more connections that have been enqueued in that N ms.. How should I tackle this.. or I can only ignore the small delay and O(n) is an OK performance
Reasoning/Approach - Do we do something similar to this in any myriads of deployments we have in production? Is there a similar problem..?
Is this "Big data"? Is the data for storing connections in last N (N is of order 10) minutes of the order a Big Data problem?
My Effort: I know that -
- connections to a web server are put in a queue before being served by a thread
- each connection has a timestamp
Approach -
- As soon as a connection is put in the Queue, write it to a file.. (at least its timestamp and a handle/unique identifier to the connection)
- As soon as client requests "give me num connections in last 1 minute" .. process the file to find out the number of connections .. we know current time in millisecs and that we have to find connections the timestamp of which fall in currentTime - 60 secs
- This job can be run using map reduce .. I also know that the file has sorted data (by timestamp)..
I also run a daemon that removes entries/files older than 10 mins.. so that i dont store unwanted data