0

I'm a new Python user and learning how to manipulate/aggregate data.

I have some sample data of the format:

User Date      Price
A    20130101   50
A    20130102    20
A    20130103    30
B    20130201    40
B    20130202    20

and so on.

I'm looking for some aggregates around each user and expecting an output for mean spend like:

User  Mean_Spend
A      33
B      30

I could read line by line and get aggregates for one user but I'm struggling to read the data for different users.

Any suggestions highly appreciated around how to read the file for different users.

Thanks

dabhaid
  • 3,849
  • 22
  • 30
user1661244
  • 121
  • 1
  • 3
  • 6

2 Answers2

0

The collections have a Counter object (documentation) based off of Dictionary that's meant for this kind of quick summation. Naively, you could use one to accumulate the spend amounts, and another to tally the number of transactions, and then divide.

from collections import Counter
accumulator =  Counter()
transactions = Counter()

# assuming your input is exactly as shown...
with open('my_foo.txt', 'r') as f:
    f.readline() # skip header line
    for line in f.readlines():
        parts = line.split()
        transactions[parts[0]] += 1
        accumulator[parts[0]]+=int(parts[2])

result = dict((k, float(accumulator[k])/transactions[k]) for k in transactions)
dabhaid
  • 3,849
  • 22
  • 30
0

The pandas package does this (and much, much more). With a tiny dataset dft for an example:

dft = pandas.DataFrame([[1,2,'a'],[1,2,'b'],[3,1,'b'],[4,1,'a']])
dfg = dft.groupby(dft[2]) # Group by the third column
dfg.mean()

Output:

  0   1
a 2.5 1.5
b 2.0 1.5

which are the means for the 0th and 1th columns respectively.

I don't know what format your data is in; you can make a pandas dataframe by reading from a text file, or converting a python array, etc.

You can group by more than one column, or look at time-trends of spend per user, or... Lots of cookbook examples for pandas here and in their docs.

cphlewis
  • 15,759
  • 4
  • 46
  • 55