0

am new to this topic and I am trying to learn so my question might have been a bit confusing. What my question actually is: I have a stream of data coming in the form of tuple [Country, state, person]. now on this stream of data, I want to perform the operation of calculating the average number of people in the state. I though of doing it by taking the Key as [Country, State]. for every unique tuple, a hash function updates a bucket which contains the count.

For Eg: If I have a tuple [USA, Ohio, person1], then when this comes in bucket 2 is updated, and every time the tuple with USA and Ohio comes in, this count keeps increasing. this would give me the total number of people who are from USA-Ohio, but I am confused on how to find the average of it i.e the average number of people who belong to [USA,Ohio]. I hope this cleared up things a bit.

  • 1
    Indeed your question doesn't make sense. Average of what? The number of people in a state is a single quantity. There's nothing to average. If there were arrival times associated with the tuples, then you could compute average arrival rate. If such tuples were also tagged as arrivals or departures, then you could compute average population over time. Etc, etc. You must better define the problem you're trying to solve. – Gene Feb 27 '20 at 01:34
  • @Gene ya, I'm sorry. instead of the average, if I'm trying to find the fraction of people belonging to Ohio in the whole USA, How do it calculate this using the counts in the bucket? – deepthi sai Feb 27 '20 at 01:46
  • 1
    Okay. Now what do you mean by "stream of data?" Streaming algorithms generally produce a stream of outputs that reflect input seen so far. So you need to describe how that should look. Or else say you need only one set of fractions as output. In that case, edit your question do drop the word stream. Just call it a set of tuples. A precise question is often most of a good answer. – Gene Feb 27 '20 at 03:00
  • The input is a stream of data, by which I mean is that continuous tuples in the pattern [Country, State, PersonID] keep coming in. But the output that I need is just the value for the fraction. – deepthi sai Feb 27 '20 at 04:29
  • For example,i learnt that if we consider the count_min algorithm, it takes in a stream of items as an input and returns the frequency of a particular item. I want to do something similar but instead of the frequency, I want calculate the fraction. – deepthi sai Feb 27 '20 at 04:31
  • But _when_ do you want to calculate the fraction? Just once after all data have arrived? Or do you want to produce a set fractions each time a new value arrives? The same.question would occur for computing frequency. You should give a concrete example. Show the input and expected output(s). – Gene Feb 28 '20 at 04:30

0 Answers0