0

I have Daily Crude oil prices downloaded from FRED, about 10k observations, some values are blank(code cleans them). I believe that I cannot share excel sheets here, so I will just give you a screenshot of what the data looks like:

enter image description here

I calculate the differences and returns and clean up the data but I am kind of stuck.

Here is what the code looks like to get you started:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 


data = pd.read_csv("DCOILWTICO.csv")

nan_value = float("NaN")

data.replace("", nan_value, inplace=True)
data.replace(".", nan_value, inplace=True)

data['Previous'] = data['DCOILWTICO'].shift(1)

data.dropna(subset=['Previous'],inplace=True)

data.replace("", nan_value, inplace=True)
data.replace(".", nan_value, inplace=True)

data['DCOILWTICO'] = data['DCOILWTICO'].astype(float)

data['Previous'] = data['Previous'].astype(float)

data['Diff'] = data['DCOILWTICO'] - data['Previous']

data['Return'] = (data['DCOILWTICO'] - data['Previous'])/data['Previous']

Here comes the question: I am trying to duplicate the graph below.(which I believe was generated using Mathematica) The difficult part is to be able to create the bins in the right way. Looking at the graph it looks like there are around 200 bins. On the x-axis are the returns and on the y axis are the frequencies(which have been binned).

enter image description here

Dio
  • 231
  • 1
  • 2
  • 10

1 Answers1

1

I think you are asking how to make equally spaced bins in logspace. If so then use the np.geomspace function (geometric space), rather than np.linspace (linear space).

plt.figure()
bins = np.geomspace(data['returns'].min(), data['returns'].max(), 200)
plt.hist(data['returns'], bins = bins)
rl4215
  • 29
  • 4
  • Useful function! But still not 100% clear about the procedure from this to the plot. – Dio Nov 27 '21 at 15:55
  • The code after the edit, returns an array where the first value is the min, and the last value is the max, and the other 198 entries are NAN. – Dio Nov 27 '21 at 16:12
  • that is: this line, bins = np.geomspace(data['Return'].min(), data['Return'].max(), 200) – Dio Nov 27 '21 at 16:12
  • I don't have the data sorry. You can put any values you want for the min and max values. So try 0.02 for minimum and 0.2 for maximum – rl4215 Nov 27 '21 at 16:17
  • The max and min functions seem to work actually. They individually return: 0.5308641975308642 and -3.019661387220098 – Dio Nov 27 '21 at 16:21
  • Oh I think it just can't deal with negative numbers actually. – Dio Nov 27 '21 at 16:25