1

I have the following dataframe given:

    point               timestamp_local         0
0   A                   2019-07-20 00:00:00     1
1   A                   2019-07-20 01:00:00     3
2   B                   2019-07-20 02:00:00     158
3   A                   2019-07-20 02:30:00     324
4   B                   2019-07-20 03:00:00     502

The dataframe tells me on which point at which time timestamp_local how many connections I had. The 0 is the count of the connections I had.

I want to plot this data now using the plotnine library. I have done this already and its working when I use timestamps without times, e.g. 2019-07-20. But when I use timestamps with times, e.g. 2019-07-20 00:00:00 its not working.

This is my python command to plot the data without times:

pn.ggplot(df, pn.aes(x="timestamp_local", y="0", group="point", color="point")) + pn.geom_line(stat="identity")

This returns a figure where I can see the counts per day grouped by the point. enter image description here

I have now two questions:

  1. How can I plot the same result when using timestamps with times like 2019-07-20 01:00:00 (the data go over several days. So I cannot just cut of the date!)
  2. How can I plot the same result grouped by month and year? (E.g. 2019-07, 2019-08, 2019-09 and so on...)

I would highly prefer a solution with the plotnine library because there are more functinos I want to use later on e.g. smooth and so on. If its not possible with the plotnine library I would like to have a figure where I have one line for each point in a different color and the same figure! Like in the figure above, red is point A, blue is point B.

Kind regards

Jan
  • 1,180
  • 3
  • 23
  • 60

1 Answers1

0

Data provided was stored in conn.csv, theme customization is included. First case displays full timestamp as requested using date_format function from mizani (https://mizani.readthedocs.io/en/stable/formatters.html#mizani.formatters.date_format).

from plotnine import *
import pandas as pd
from mizani.formatters import date_format

df = pd.read_csv('conn.csv', parse_dates=[1])
custom_axis = theme(axis_text_x = element_text(color="grey", size=6, angle=90, hjust=.3),
                    axis_text_y = element_text(color="grey", size=6), 
                    plot_title = element_text(size = 25, face = "bold"), 
                    axis_title = element_text(size = 10)  
                    ) 

(
    ggplot(data = df, mapping = aes(x='timestamp_local', y='0', group="point", color="point")) + 
    geom_line(stat="identity") + custom_axis + ylab("Count") + xlab("TimeStamp") + labs(title="Count of the Connections") +
    scale_x_datetime(labels = date_format("%Y-%m-%d %H:%M:%S"))
)

Full timestamp plot

to_period function is used to extract and add month_year column used to perform aggregation. geom_point used due to lack of information.

Year-Month aggregation

GERMAN RODRIGUEZ
  • 397
  • 1
  • 4
  • 9