4

I'm experimenting with pandas and non-matplotlib plotting. Good suggestions are here. This question regards yhat's ggplot and I am running into two issues. Plotting a series in pandas is easy.

frequ.plot()

I don't see how to do this in the ggplot docs. Instead I end up creating a dataframe:

cheese = DataFrame({'time': frequ.index, 'count' : frequ.values})
ggplot(cheese, aes(x='time', y='count')) + geom_line()

I would expect ggplot -- a project that has "tight integration with pandas" -- to have a way to plot a simple series.

Second issue is I can't get stat_smooth() to display when the x axis is time of day. Seems like it could be related to this post, but I don't have the rep to post there. My code is:

frequ = values.sampler.resample("1Min", how="count")
cheese = DataFrame({'time': frequ.index, 'count' : frequ.values})
ggplot(cheese, aes(x='time', y='count')) + geom_line() + stat_smooth()

Any help regarding non-matplotlib plotting would be appreciated. Thanks! (I'm using ggplot 0.5.8)

Community
  • 1
  • 1
zerovector
  • 1,350
  • 1
  • 10
  • 12
  • What do you mean there's got to be a better way? Shorter? That's just the syntax of ggplot. You create a figure with some aesthetics and add geometries to it. – TomAugspurger May 08 '14 at 12:41
  • @TomAugspurger I edited the post to clarify. – zerovector May 08 '14 at 14:07
  • Your cheese datafre is just a copy of two columns of frequ. Just put fequ in without making a copy. As far as tight integration goes...that's a reference to being able to pass a dataframe to ggplot. The syntax you see is as tight as ggplot gets. – cwharland May 08 '14 at 15:27
  • There is currently no way to pass a series as data to the ggplot function (see https://github.com/yhat/ggplot/issues/285). – Jan Katins May 08 '14 at 18:44

2 Answers2

10

I run into this problem frequently in Python's ggplot when working with multiple stock prices and economic timeseries. The key to remember with ggplot is that data is best organized in long format to avoid any issues. I use a quick two step process as a workaround. First let's grab some stock data:

import pandas.io.data as web
import pandas as pd
import time
from ggplot import *

stocks = [ 'GOOG', 'MSFT', 'LNKD', 'YHOO', 'FB', 'GOOGL','HPQ','AMZN'] # stock list

# get stock price function #
def get_px(stock, start, end):
    return web.get_data_yahoo(stock, start, end)['Adj Close']

# dataframe of equity prices   
px = pd.DataFrame({n: get_px(n, '1/1/2014', date_today) for n in stocks})

px.head()
              AMZN     FB  GOOG   GOOGL    HPQ    LNKD   MSFT   YHOO
Date                                                                
2014-01-02  397.97  54.71   NaN  557.12  27.40  207.64  36.63  39.59
2014-01-03  396.44  54.56   NaN  553.05  28.07  207.42  36.38  40.12
2014-01-06  393.63  57.20   NaN  559.22  28.02  203.92  35.61  39.93
2014-01-07  398.03  57.92   NaN  570.00  27.91  209.64  35.89  40.92
2014-01-08  401.92  58.23   NaN  571.19  27.19  209.06  35.25  41.02

First understand that ggplot needs the datetime index to be a column in the pandas dataframe in order to plot correctly when switching from wide to long format. I wrote a function to address this particular point. It simply creates a 'Date' column of type=datetime from the pandas series index.

def dateConvert(df):
  df['Date'] = df.index
  df.reset_index(drop=True)
  return df

From there run the function on the df. Use the result as the object in pandas pd.melt using the 'Date' as the id_vars. The returned df is now ready to be plotted using the standard ggplot() format.

px_returns = px.pct_change() # common stock transformation
cumRet = (1+px_returns).cumprod() - 1 # transform daily returns to cumulative 
cumRet_dateConverted = dateConvert(cumRet) # run the function here see the result below#

cumRet_dateConverted.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 118 entries, 2014-01-02 00:00:00 to 2014-06-20 00:00:00
Data columns (total 9 columns):
AMZN     117 non-null float64
FB       117 non-null float64
GOOG     59 non-null float64
GOOGL    117 non-null float64
HPQ      117 non-null float64
LNKD     117 non-null float64
MSFT     117 non-null float64
YHOO     117 non-null float64
Date     118 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(8)


data = pd.melt(cumRet_dateConverted, id_vars='Date').dropna() # Here is the method I use to format the data in the long format. Please note the use of 'Date' as the id_vars.

data = data.rename(columns = {'Date':'Date','variable':'Stocks','value':'Returns'}) # common to rename these columns

From here you can now plot your data however you want. A common plot I use is the following:

retPlot_YTD = ggplot(data, aes('Date','Returns',color='Stocks')) \
+ geom_line(size=2.) \
+ geom_hline(yintercept=0, color='black', size=1.7, linetype='-.') \
+ scale_y_continuous(labels='percent') \
+ scale_x_date(labels='%b %d %y',breaks=date_breaks('week') ) \
+ theme_seaborn(style='whitegrid') \
+ ggtitle(('%s Cumulative Daily Return vs Peers_YTD') % key_Stock) 

fig = retPlot_YTD.draw()
ax = fig.axes[0]
offbox = ax.artists[0]
offbox.set_bbox_to_anchor((1, 0.5), ax.transAxes)
fig.show()

FB cumRet plot using ggplot

BCR
  • 960
  • 11
  • 27
8

This is more of a workaround but you can use qplot for quick, shorthand plots using series.

from ggplot import *
qplot(meat.beef)
Greg
  • 1,070
  • 11
  • 16
  • Very cool. I didn't know about qplot. Ideally, I'd like the x axis to be the index of the series, but for a quick plot it works well. Any advice about issue #2? stat_smooth() doesn't plot anything when the x axis is time of day. – zerovector May 14 '14 at 09:23