3

I have a collection outcome resulting from the function:

Counter(df.email_address)

it returns each individual email address with the count of its repetitions.

Counter({nan: 1618, 'store@kiddicare.com': 265, 'testorders@worldstores.co.uk': 1})

what I want to do is to use it as if it was a dictionary and create a pandas dataframe out of it with two columns one for email addresses and one for the value associated.

I tried with:

dfr = repeaters.from_dict(repeaters, orient='index')

but i got the following error:

AttributeError: 'Counter' object has no attribute 'from_dict'

It makes thing that Counter is not a dictionary as it looks like. Any idea on how to append it to a df?

Blue Moon
  • 4,421
  • 20
  • 52
  • 91
  • 3
    `from_dict` is a class method of DataFrames, not dictionaries/Counters. You could try: `dfr = pd.DataFrame.from_dict(repeaters, orient='index')` – Alex Riley Aug 04 '15 at 11:22
  • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html – Blue Moon Aug 04 '15 at 11:22
  • @ajcr , I was just going to answer that. – omri_saadon Aug 04 '15 at 11:28
  • @omri_saadon: do feel free to provide an answer if you'd like; comments are generally less useful so I'm happy to delete mine if an answer appears. – Alex Riley Aug 04 '15 at 11:31
  • 2
    A counter is a subclass of dict and can be turned into a regular dict with dict(counter), see https://docs.python.org/3/library/collections.html#collections.Counter –  Aug 04 '15 at 11:54
  • Why don't you just use `df.email_address.value_counts()`? – EdChum Aug 04 '15 at 12:00

4 Answers4

22
d = {}
cnt = Counter(df.email_address)
for key, value in cnt.items():
    d[key] = value

EDIT

Or, how @Trif Nefzger suggested:

d = dict(Counter(df.email_address))
doru
  • 9,022
  • 2
  • 33
  • 43
2

as ajcr wrote at the comment, from_dict is a method that belongs to dataframe and thus you can write the following to achieve your goal:

from collections import Counter
import pandas as pd

repeaters = Counter({"nan": 1618, 'store@kiddicare.com': 265, 'testorders@worldstores.co.uk': 1})

dfr = pd.DataFrame.from_dict(repeaters, orient='index')
print dfr

Output:

testorders@worldstores.co.uk     1
nan                           1618
store@kiddicare.com            265
omri_saadon
  • 10,193
  • 7
  • 33
  • 58
1

Alternatively you could use pd.Series.value_counts, which returns a Series object.

df.email_address.value_counts(dropna=False)

Sample output:

b@y.com    2
a@x.com    1
NaN        1
dtype: int64

This is not exactly what you asked for but looks like what you'd like to achieve.

ldirer
  • 6,606
  • 3
  • 24
  • 30
1

Not sure why there are many convoluted ways.

  1. Counter is a dict subclass. So you can pass to anything that expects a param of type dict.
class Counter(dict):
    '''Dict subclass for counting hashable items...
  1. If you really really want to convert Counter to a dict:
>>> d1 = dict(cntr)
>>> d1
{nan: 1618, 'store@kiddicare.com': 265, 'testorders@worldstores.co.uk': 1}
>>> 
>>> 
>>> d2 = {k: v for k, v in cntr.items()}
>>> d2
{nan: 1618, 'store@kiddicare.com': 265, 'testorders@worldstores.co.uk': 1}
>>> 
  1. To create a Pandas DataFrame from Counter use pandas.DataFrame.from_dict(). It takes a dict, but a dict of either:
    • {'col_name1': [r1c1, r2c1...], 'col_name2': [r1c2, r2c2,...], ... OR
    • {'row_id1': [r1c1, r1c2,...], 'row_id2': [r2c1, r2c2,...], ...

where rNcM is the value Nth row and Mth column.

>>> from collections import Counter
>>> cntr = Counter({float('nan'): 1618, 'store@kiddicare.com': 265, 'testorders@worldstores.co.uk': 1})
>>> cntr
Counter({nan: 1618, 'store@kiddicare.com': 265, 'testorders@worldstores.co.uk': 1})
>>> 
>>> import panadas as pd
>>> pdf = pd.DataFrame.from_dict({'emails': cntr.keys(), 'repeatation_count': cntr.values()})
>>> print(pdf.to_string())
                         emails  repeatation_count
0                           NaN               1618
1           store@kiddicare.com                265
2  testorders@worldstores.co.uk                  1
>>> 
Kashyap
  • 15,354
  • 13
  • 64
  • 103