2

I have a dataframe that looks like this:

date                  number_of_books   ... (additional columns)
1997/06/01 23:15        3
1999/02/19 14:56        5
1999/10/22 18:20        7 
2001/11/04 19:13        19
...                     ...
2014/04/30 02:14        134

My goal is to create an empty scatter plot and then add each point separately since the color of the point depends on other factors within the dataframe. However, I am having trouble finding a way to create an empty scatter plot without using my dataframe yet. Is there a way to do that? (possibly by making a variable hold the plot?) I want the x-axis to be the just the date (YYYY/MM/DD) and the y axis the number of books.

My plan is to convert the date string and number_of_book strings right before I add them to the plot. So the idea would be...

for index, row in df.itterows()
    convert date to datetime and number_of_books to int
    if condition met (based on other columns):
         plot with color blue
    else:
         plot with color red
Sam
  • 319
  • 1
  • 4
  • 11
  • Why don't you just add a column in the dataframe to store the color you want for each data point, and then pass it as argument `color=` when plotting your data? – gcalmettes Mar 09 '17 at 05:56

1 Answers1

0

You could create a column in your pd.DataFrame to store the color information, and pass the arguments to each data point with the scatter plot function.

See for example:

import pandas as pd
import matplotlib.pyplot as plt

# your dataframe
df = pd.DataFrame({"date": ["1997/06/01 23:15", "1999/02/19 14:56", "1999/10/22 18:20", "2001/11/04 19:13"],
                    "number_of_books": [3, 5, 7, 19]})

# add empty column to store colors
df["color"] = np.nan

# loop over each row and attribute a conditional color
for row in range(len(df)):
    if row<2: #put your condition here
        df.loc[row, "color"] = "r"
    else: #second condition here
        df.loc[row, "color"] = "b"

# convert the date column to Datetime
df.date = pd.to_datetime(df.date)

# plot the data
plt.scatter([x for x in df.date], df.number_of_books, c=df.color)
plt.show()

Imgur

gcalmettes
  • 8,474
  • 1
  • 32
  • 28