0

I'm trying to plot a scatter plot with plotly express. I have a dataset of jobs, that has a column called ['Posting Updated']. I want to plot the year that the job was posted against the count with information given on which month is was posted in. Maybe color or size? I can't seem to set my data up in a way where I can do this.

Has anyone got any insight on how I could do this?

Many thanks in advance

Big Love

df['Posting Updated'] = pd.to_datetime(df['Posting Updated'])
years  = df.groupby(df['Posting Updated'].dt.year)['Job ID'].count()
months = df.groupby(df['Posting Updated'].dt.month)['Job ID'].count()
years_df = pd.DataFrame(years)
months_df = pd.DataFrame(months)
job_growth = px.scatter(years_df, x = years_df.index, 
                    size = 'Job ID', color = months_df.index)

1 Answers1

0

Since what you want to achieve is a bit vague, I understood from your question that you want to express the magnitude of the frequency as a scatter plot with the year on the x-axis and the month on the y-axis. For the data used in the graph, I created 200 pieces of year and month data using random numbers as appropriate, and then grouped them by year and month to calculate the frequency. The x- and y-axes of the graphs were set to only the necessary scales.

import pandas as pd
import plotly.express as px
import numpy as np
import random

df = pd.DataFrame({'JOb ID':['{}'.format(x) for x in np.arange(1000,1200)],
                   'Posting Updated': random.choices(pd.date_range('2016-01-01', '2021-01-01',freq='1d'), k=200),
                  'value':[1]*200})

df['year'] = df['Posting Updated'].dt.year
df['month'] = df['Posting Updated'].dt.month
dfs = df.groupby(['year','month']).size().to_frame('value')
dfs.reset_index(inplace=True)
    
fig = px.scatter(dfs, x='year', y='month',size='value', color='value')

fig.update_yaxes(tickvals=np.arange(1,13))
fig.update_xaxes(tickvals=dfs.year.unique())
fig.show()

enter image description here

r-beginners
  • 31,170
  • 3
  • 14
  • 32