0

I want to create a scatterplot with matplotlib and a simple pandas dataframe. Have tested almost everything and nothing works and honestly I have just now ordered a book on matplotlib.

Dataframe looks like this

           Time     Type    Price          Volume
0   03:03:26.936    B   1.61797     1000000
1   03:41:06.192    B   1.61812     1000000
2   05:59:12.799    B   1.62280     410000
3   05:59:12.814    B   1.62280     390000
4   06:43:33.607    B   1.62387     1000000
5   06:43:33.621    S   1.62389     500000
6   06:47:36.834    B   1.62412     1000000
7   08:15:13.903    B   1.62589     1000000
8   09:15:31.496    S   1.62296     500000
9   10:29:24.072    S   1.61876     500000
10  10:49:08.619    S   1.61911     1000000
11  11:07:01.213    S   1.61882     1000000
12  11:07:01.339    S   1.61880     200000
13  11:23:00.300    S   1.61717     1000000

Type B should be green in color and Type S Blue and dots should be different in size depending on volume! Any idea how to achieve this or a guide somewhere?

1 Answers1

2

A solution using just matplotlib:

import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter

# Your Time column is stored as strings. Convert them to Timestamp
# so matplotlib can plot a proper timeline
times = pd.to_datetime(df['Time'])

# Set the marker's color: 'B' is green, 'S' is blue
colors = df['Type'].map({
    'B': 'green',
    'S': 'blue'
})

# Limit the x-axis from 0:00 to 24:00
xmin = pd.Timestamp('0:00')
xmax = xmin + pd.Timedelta(days=1)

# Make the plot
fig, ax = plt.subplots(figsize=(6,4))
ax.scatter(x=times, y=df['Price'], c=colors, s=df['Volume'] / 2000, alpha=0.2)
ax.set(
    xlabel='Time',
    xlim=(xmin, xmax),
    ylabel='Price'
)
ax.xaxis.set_major_formatter(DateFormatter('%H:%M'))

Result:

Scatter Plot

Code Different
  • 90,614
  • 16
  • 144
  • 163
  • What kind of sorcerer are you?=) Perfect! I will now try to understand row by row and with humble greetings I thank you! –  Sep 08 '19 at 17:55
  • @CodeDifferent: Please be aware that you are contributing to one of the main problems of stackoverflow, that is, useful information is distributed over more and more questions making it even harder for future readers to find the desired content. If instead you had given this answer to one of the already numerous other questions on that topic it would be much more helpful for everyone. – ImportanceOfBeingErnest Sep 08 '19 at 18:05
  • @ImportanceOfBeingErnest : Believe me I have looked. If you are so kind please add a post where the specific problem is shown with a solution? In this case I had the problem with 2 different colors based on 2 different values in column and CodeDifferent made a very nice solution by using .map. I dont use stackoverflow so often, only when I am stuck and when I can not find solution to problem after reading POSTS. Honestly don't realize what the problem is...CodeDifferent made my day and lowered my blood-pressure =) –  Sep 08 '19 at 19:08
  • [This](https://stackoverflow.com/questions/28033046/matplotlib-scatter-color-by-categorical-factors) or [this](https://stackoverflow.com/questions/26139423/plot-different-color-for-different-categorical-levels-using-matplotlib) for example. There are many others. Personally you can be thankful to anyone answering, sure, but the community and especially the next person with a similar question, will now need to look through yet another post, instead of finding all good solutions in one place. – ImportanceOfBeingErnest Sep 08 '19 at 19:16