0

I have a dataframe with xenophobic and non-xenophobic tweets. For each day, I want to count the number of tweets that have a sentiment of 1.

This is the Dataframes df_unevaluated

  sentiment id                  date                  text
0   0   9.820000e+17    2018-04-05 11:43:31+00:00   but if she had stated another fact like that I may have thought...
1   0   1.170000e+18    2019-09-03 22:53:30+00:00   the worst thing that dude has done this week is ramble about the...
2   0   1.140000e+18    2019-06-28 17:43:07+00:00   i think immigrants of all walks of life should be allowed into...
3   0   2.810000e+17    2012-12-18 00:43:57+00:00   why is america not treating the immigrants like normal people...
4   1   8.310000e+17    2017-02-14 01:42:26+00:00   who the hell wants to live in canada anyhow the people there...
...

This is what I've tried:

      # Put all tweets with sentiment = 1 into a Dataframes

for i in range(len(df_unevaluated)):
    if df_unevaluated['sentiment'][i] == 1:
        df_xenophobic = df_xenophobic.append(df_unevaluated.iloc[[i]])

      # Store a copy of df_xenophobic in df_counts
df_counts = df_xenophobic

      # Change df_counts to get counts for each date
df_counts = (pd.to_datetime(df_counts['date'])
       .dt.floor('d')
       .value_counts()
       .rename_axis('date')
       .reset_index(name='count'))

      # Sort data and drop index column
df_counts = df_counts.sort_values('date')
df_counts = df_counts.reset_index(drop=True)

      # Look at data
df_counts.head()

This was the output:

           date                count
0   2012-03-14 00:00:00+00:00   1
1   2012-03-19 00:00:00+00:00   1
2   2012-04-07 00:00:00+00:00   1
3   2012-04-10 00:00:00+00:00   1
4   2012-04-19 00:00:00+00:00   1
...

This is what I expected:

              date              count
0   2012-03-14 00:00:00+00:00   1
1   2012-03-15 00:00:00+00:00   0
2   2012-03-16 00:00:00+00:00   0
3   2012-03-17 00:00:00+00:00   0
4   2012-03-18 00:00:00+00:00   0
5   2012-03-19 00:00:00+00:00   1
6   2012-03-20 00:00:00+00:00   0
7   2012-03-21 00:00:00+00:00   0
...

These are some links I've read through: Python & Pandas - Group by day and count for each day

Using value_counts in pandas with conditions

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.floor.html

To be more clear, the each date has the format YYYY-MM-DD HH:MM:SS+00:00

As seen in my attempt, I try to round the dates column to its day. My goal is to count the number of times sentiment = 1 for that day.

1 Answers1

1

If I understood your question correctly, then it should be as simple as follows:

import pandas as pd

# Data Load
df = pd.DataFrame(data={'Date': ['2022-11-28 11:43:31+00:00', '2022-11-28 22:53:30+00:00', '2022-11-29 17:43:07+00:00', '2022-12-01 01:42:26+00:00', '2022-12-01 02:40:26+00:00'],
                      'Sentiment': [ 0, 1, 0, 1, 1]})

df['Date'] = pd.to_datetime(df['Date']).dt.date

df_counts = df.groupby(by=['Date']).sum().reset_index()

The df_counts data frame should give output like this:

enter image description here

Sadman Sakib
  • 557
  • 3
  • 10
  • I've tried it and this does not work because the dates have times in them as well, which is why I'm trying to use dt.floor() to round each 'date' to the day it occurs. I'm currently trying to implement this into your solution but no luck. I'm afraid I'm quite new to python and pandas. – FrustratedSnake Dec 11 '22 at 06:35
  • 1
    I've just updated the answer. This one here, the DateTime is being converted to date only and then used for the group by. I hope this one might be helpful for you! – Sadman Sakib Dec 11 '22 at 07:21