-2

So below is my flex_df.head(10). What im trying to do is create a bubble chart that has the has salary as the x-axis and the count of the role (Role not OriginalTitle) on the y-axis. Then to wrap it all up, i need it to have different bubble sizes to show the different source files that brought in this data.

Im trying to use plotly express but none of my code that i tries works so i have nothing viable to post.

Role           IsRemote Country Salary  SourceFile      OriginalTitle
Data Engineer   TRUE    USA 56  Flex-Jobs               Data Engineer
Data Engineer   TRUE    USA 56  hired.com               Data Engineer
Data Engineer   TRUE    USA 56  simplyhired             Data Engineer
Data Scientist  TRUE    Poland  100 hired.com           Data Science Consultant
Data Scientist  TRUE    Wrocław 100 indeed  Data Science Consultant
Data Engineer   TRUE    USA 56  indeed  Data Engineer
Data Engineer   TRUE    USA 56  Flex-Jobs   Data Engineer
Data Scientist  TRUE    USA 15  Flex-Jobs   Data Science Engineer
Data Scientist  TRUE    USA 20  Flex-Jobs   Manager, Data Science
Data Analyst    TRUE    USA 56  Flex-Jobs   Senior Data Science Analyst
Gonzalo Odiard
  • 1,238
  • 12
  • 19
  • I know you said the code you've tried doesn't work but you should still post it. This will show us that you have made a good faith effort and that will give us something to start with. When you say it doesn't work do you mean that you are getting an error or that it doesn't produce the chart you want? – Derek O Feb 21 '22 at 02:33

1 Answers1

0

You have defined x, y and size arguments of a scatter.

  • size needs to be numeric, hence have used https://pandas.pydata.org/docs/reference/api/pandas.factorize.html to change from categorical to numeric
  • you have not defined how to deal with roles that are contributed to by multiple sources. Hence assumed it's a simple aggregation
  • with data structured it is now very simple to generate a scatter
import io
import pandas as pd
import plotly.express as px

flex_df = pd.read_csv(
    io.StringIO(
        """Role           IsRemote  Country Salary  SourceFile      OriginalTitle
Data Engineer   TRUE    USA 56  Flex-Jobs               Data Engineer
Data Engineer   TRUE    USA 56  hired.com               Data Engineer
Data Engineer   TRUE    USA 56  simplyhired             Data Engineer
Data Scientist  TRUE    Poland  100 hired.com           Data Science Consultant
Data Scientist  TRUE    Wrocław 100 indeed  Data Science Consultant
Data Engineer   TRUE    USA 56  indeed  Data Engineer
Data Engineer   TRUE    USA 56  Flex-Jobs   Data Engineer
Data Scientist  TRUE    USA 15  Flex-Jobs   Data Science Engineer
Data Scientist  TRUE    USA 20  Flex-Jobs   Manager, Data Science
Data Analyst    TRUE    USA 56  Flex-Jobs   Senior Data Science Analyst"""
    ),
    sep="\s\s+",
    engine="python",
)

px.scatter(
    flex_df.groupby(["Role", "SourceFile"], as_index=False)
    .size()
    .assign(bubblesize=lambda df: pd.factorize(df["SourceFile"])[0] + 1),
    x="size",
    y="Role",
    size="bubblesize",
    hover_data=["SourceFile"],
)

enter image description here

Rob Raymond
  • 29,118
  • 3
  • 14
  • 30