1

I've got a set of datetime data in a pandas series of the format 2020-10-02 18:48:21. I want to plot only the time contained within that datetime against a score value in another DataFrame.

Below is the closest that I've gotten using the following code:

Closest scatter plot

def timePop(oneDate, oneScore, twoDate, twoScore, title):
    myFmt = "%H:%M:%S"

    for index, value in oneDate.iteritems():
        new_time = datetime.fromtimestamp(oneDate[index]).time()
        oneDate[index] = datetime.strptime(str(new_time), myFmt)


    for index, value in twoDate.iteritems():
        new_time2 = datetime.fromtimestamp(twoDate[index]).time()
        twoDate[index] = datetime.strptime(str(new_time2), myFmt)


    fig, ax = pylt.subplots()
    ax.scatter(oneDate, oneScore, c='r', marker='*', label="Popular")
    ax.scatter(twoDate, twoScore, c='b', marker='o', label="Unpopular")
    pylt.xlabel('24 Hours')
    pylt.ylabel('Scores')
    pylt.xticks(rotation=45)
    # ax.format_xdata = mdates.DateFormatter(myFmt)
    # ax.set_xticks([0, 1, 2, 3, 4, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24])
    pylt.title('Time and Score Distribution of Posts in %s' % title)
    pylt.show()

I've worked through a variety of the possible solutions here on StackOverflow, as well as through the docs, but no luck. Ideally, I'd like the x-axis to go in increments of 1 hr. If anyone can help, I'd appreciate it.

Broski-AC
  • 739
  • 6
  • 12
  • Do you want to have them ordered according to date + time but display only the hours at the ticks, or do you want them grouped by hours? – max Dec 07 '20 at 07:06
  • Grouped by hours. All of the data was taken across two days, so going strictly by date would be unhelpful. The other option I could think of was doing something like "evening", "morning", "afternoon", or maybe just changing it to a numerical value, rather than a datetime time – Broski-AC Dec 07 '20 at 22:02

1 Answers1

0

Let's see: Assuming you have a toy example like this:

import matplotlib.pyplot as plt
import datetime

dates = [datetime.date(2020,12,i) for i in range(1,24)]
num = [i for i in range(1,24)]

representing the days until Christmas. All you want to to is extract the weekdays of the dates in the list and plot them:

plt.plot([d.weekday() for d in dates],num,'.') # monday = 0

weekdays

The week starts with Monday, which is represented as a zero. If you want to get strings with the names of the days, you can format the output with strftime:

plt.plot([d.strftime("%A") for d in dates],num,'.')

weekday names

Basically, you can group the data by strings. This also means, any string is sufficient. We can take this to categorize daytimes as follows:

dates2 = [datetime.datetime(2020,12,i,i,i,0) for i in range(1,24)]

def f(x):
    if (x > 4) and (x <= 8):
        return 'Early Morning'
    elif (x > 8) and (x <= 12 ):
        return 'Morning'
    elif (x > 12) and (x <= 16):
        return'Noon'
    elif (x > 16) and (x <= 20) :
        return 'Eve'
    elif (x > 20) and (x <= 24):
        return'Night'
    elif (x <= 4):
        return'Late Night'

plt.plot([f(d.hour) for d in dates2],num,'.')

First, we create new dummy-data which is of type datetime.datetime rather than just datetime.date. Then, I took a function with a switch-case-like structure proposed here to categories. And it is the same procedure as before. We create instantly a new list which extracts the .hour from the datetime object and feeds it to the function f, which categorizes the data times

max
  • 3,915
  • 2
  • 9
  • 25