0

I want to count the number of occurences in a dataframe, and I need to do it using the following function:

for x in homicides_prec.reset_index().DATE.drop_duplicates():
count= homicides_prec.loc[x]['VICTIM_AGE'].count()
print(count)

However, this only works for when the specific Date is repeated. It does not work when dates only appear once, and I don't understand why. I get this error:

TypeError: count() takes at least 1 argument (0 given)

That said, it really doesn't make sense to me, because I get that error for this specific value (which only appears once on the dataframe):

for x in homicides_prec.reset_index().DATE[49:50].drop_duplicates():
count= homicides_prec.loc[x]['VICTIM_AGE'].count()
print(count)

However, I don't get the error if I run this:

homicides_prec.loc[homicides_prec.reset_index().DATE[49:50].drop_duplicates()]['VICTIM_AGE'].count()

Why does that happen??? I can't use the second option because I need to use the for loop.

More info, in case it helps: The problem seems to be that, when I run this (without counting), the output is just a number:

for x in homicides_prec.reset_index().DATE[49:50].drop_duplicates(): count= homicides_prec.loc[x]['VICTIM_AGE'] print(count)

Output: 33

So, when I add the .count it will not accept that input. How can I fix this?

Nick ODell
  • 15,465
  • 3
  • 32
  • 66

1 Answers1

0

There are a few issues with the code you shared, but the shortest answer is that when x appears only once you are not doing a slice, rather you are accessing some value.

if x == '2019-01-01' and that value appears twice then

homicides_prec.loc[x]

will be a pd.DataFrame with two rows, and

homicides_prec.loc[x]['VICTIM_AGE']

will be a pd.Series object with two rows, and it will happily take a .count() method.

However, if x == '2019-01-02' and that date is unique, then

homicides_prec.loc[x]

will be a pd.Series representing row where the index is x

From that we see that

homicides_prec.loc[x]['VICTIM_AGE']

is a single value, so .count() does not make sense.

Myccha
  • 961
  • 1
  • 11
  • 20
  • I'd probably run `homicides_prec.reset_index()['DATE'].value_counts()` if all you want is the count of dates. Else: `homicides_prec.reset_index().groupby('DATE')['VICTIM_AGE].count()` – Myccha Dec 09 '19 at 03:06
  • The explanation makes total sense, and I'm just starting to code, so getting the hang of it. I appreciate all the help. That said, the solution doesn't work because I need the file to search for a particular date and then run the count function. Value_counts will count all values, no just the one date I need (correct me if I'm wrong!) and the groupby solution doesn't find the particular date either. – Alexander Dow Dec 09 '19 at 04:18
  • What are you aiming to do? – Myccha Dec 09 '19 at 04:34
  • I'm trying to build a new DataFrame taking information from another df. I'm building a series with the count of each row per date and a specific value in another column. I then append each series into the new dataframe. I know this is likely not the best way to do it, but I haven't been able to crack the apply function to work with my datasets. – Alexander Dow Dec 09 '19 at 04:57
  • Can you try: `df.reset_index().groupby('DATE')['VICTIM_AGE'].count().reset_index()` and let me know if that works for you. If you want columns: DATE, COUNT_OF_DATE_OCCURENCE, VALUE_OF_COLUMN_X, then try: `df.reset_index().groupby('DATE').apply(lambda x: pd.Series({'COUNT_OF_DATE_OCCURENCE': x['VICTIMG_AGE'].count(), 'VALUE_OF_COLUMN_X':x['X'].values[0]}))` Though note that it's a bit hacky. – Myccha Dec 09 '19 at 11:23
  • The second one worked perfectly!! I needed columns and that one really did the trick, and it was really much simpler than what I was trying to do. Thanks!! – Alexander Dow Dec 10 '19 at 03:58
  • Thinking back... Would `df.reset_index().groupby(['DATE','X'])['VICTIMG_AGE'].count().reset_index()` work? – Myccha Dec 12 '19 at 11:55