0

I am new to python and I am trying to plot the data where date and time is on the X axis. The data is about the number of tweets over hours, over the span of few days. Since the data is huge, the X axis scale becomes invisile. Below is the snippet from main data (The data I want to plot)

> Date       Hour 
> 2017-06-01  0        9922287
>             1        8518504
>             2       11329880
>             3        8917199
>             4        2561618
>             5        5356574
>             6        9094935
>             7        5668480
>             8       10685864
>             9        4817401
>             10      13737030
>             11      13102746
>             12      36891729
>             13      28093150
>             14      13071736
>             15      26999175
>             16      25637322
>             17      24140113
>             18      12172451
>             19      27828496
>             20      14746762
>             21      30112348
>             22      25418125
>             23      15357580 
> 2017-06-02  0       11392671
>             1        5044931
>             2        4476793
>             3        2218296
>             4        1736378
>             5         838815
>                       ...    
> 2017-06-03  22      10569552
>             23       9315997

I have used the below code for my plot.

df.plot(marker='*')

plt.legend().set_visible(False)

plt.title("Number of tweets on hourly basis")

enter image description here

When adjusting for size, I used plt.figure(figsize=(20,10)), I get the below image. But still my x axis numbers are invisible.

> enter image description here

petezurich
  • 9,280
  • 9
  • 43
  • 57
Malathy
  • 63
  • 7

1 Answers1

0

Probably the latest pandas version isn't installed. On my system with pandas 1.0.3, the x-ticks are displayed as [2017-06-01 00:00:00, 0]. Setting a label rotation with df.plot(marker='*', rot=30) makes that they don't overlap.

But anyway, this isn't a very pleasing output. (I'm supposing the 'Date' column is in pandas date format. If it has a string format, the result would be similar, without the 00:00:00.)

Anyway, the way to go would be to combine the date and hour columns to one datetime column. Here is a possible approach:

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np

# first create a dataframe similar to the example
days = pd.date_range('2017-06-01', '2017-06-03', freq='D')
df = pd.DataFrame({'Date': np.repeat(days, 24),
                   'Hour': np.tile(np.arange(0, 24), len(days)),
                   'NumTweets': np.random.binomial(10000, 0.2, 24 * len(days))})
df.set_index(['Date', 'Hour'], drop=True, inplace=True)

# df.plot(marker='*', rot=30)  # this would be the plot from the question

df.reset_index(inplace=True) # remove the index, making 'Date' and 'Hour' regular columns
# create a new column combining 'Date' and 'Hour'
df['Time'] = pd.to_datetime(df['Date'].dt.strftime('%Y-%m-%d') + ' ' + df['Hour'].astype(str).str.zfill(2))
# use the new column as index
df.set_index('Time', drop=True, inplace=True)

# as the 'Date' and 'Hour' columns are still there, indicate we only want to plot the 'NumTweets' column
df.plot(y='NumTweets', marker='*', rot=20) # rot=0 would also work, depending on the figure width
plt.tight_layout() # make space to show the labels

plt.show()

Note that pandas will adapt your x-axis depending on the number of days shown. With only 3 days, there will be 'major' ticks for the days at 00:00 h, and 'minor' ticks at 12:00. With more days, there will be no ticks for the hours.

resulting plot

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • Hi, thanks for the quick turnaround. I am still getting the graph as the same as before even after trying your step. – Malathy Mar 23 '20 at 16:28
  • Do you get the same output if you run my test code? Do you have the latest pandas installed? Note that things can get confusing if you don't stay with pandas datetime format. Pandas has a different idea than standard matplotlib about how a datetime should look like. – JohanC Mar 23 '20 at 16:43
  • Did you try with the latest pandas version (1.0.3)? – JohanC Mar 29 '20 at 13:57