1

I have a CSV file with data that look like this:

Time               Pressure
1/1/2017 0:00       5.8253
...                     ...
3/1/2017 0:10       4.2785
4/1/2017 0:20       5.20041
5/1/2017 0:30       4.40774
6/1/2017 0:40       4.03228
7/1/2017 0:50       5.011924
12/1/2017 1:00      3.9309888

I want to make a month-wise histogram (NORMALIZED) on the pressure data and finally write the plots into PDF. I understand that I need to use Groupby and Numpy.hist option,but I'm not sure how to use them. (I'm a newbie to Python). Please help!

CODE 1:

n = len(df) // 5
for tmp_df in (df[i:i+n] for i in range(0, len(df), n)):
    gb_tmp = tmp_df.groupby(pd.Grouper(freq='M'))
    ax = gb_tmp.hist()
    plt.setp(ax.xaxis.get_ticklabels(),rotation=90)
    plt.show()
    plt.close()

This gives me the following error message:

ValueError: range() arg 3 must not be zero

CODE 2:

df1 = df.groupby(pd.Grouper(freq='M'))
np.histogram(df1,bins=10,range=None,normed=True)

This returns another error message:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried the above codes, but got these errors. Not sure if I'm using it right.

Mr. T
  • 11,960
  • 10
  • 32
  • 54
Unknown
  • 77
  • 3
  • 12
  • It is too broad. You should be able to adapt the solution I provided to [Your other question](https://stackoverflow.com/q/50322704/2823755). You should try to formulate a solution then if there are specific problems you run into, come back and ask specifically about those. – wwii May 20 '18 at 15:41
  • [Why “Can someone help me?” is not an actual question?](https://meta.stackoverflow.com/questions/284236/why-is-can-someone-help-me-not-an-actual-question) – wwii May 20 '18 at 15:47
  • @wwii:Thanks for that. Im new to Stackoverflow as well as Python. Now I have edited my question with the approach I tried. – Unknown May 21 '18 at 02:44

1 Answers1

1

A few simple steps. First you need to read your data file, into an array of cells. once you have your list of lists or rows of entry ( what ever you want to call them ) you need to collect all the observations for each month and take the average of each collection. Here I have implemented a simple buckets class to facilitate the aggregation of pressures into groups my months and provide the mean for each group. Lastly I plotted the result with matplotlib.

def readData(fn):
    fh = open(fn)
    lines = fh.read().split("\n")
    ret = [k.split("       ") for k in lines[1:]]
    fh.close()
    return(ret)

class buckets:
    def __init__(self):
        self.data = {}
    def add(self,key,value):
        if not(key in self.data.keys()):
            self.data[key]=[]
        self.data[key].append(value)
    def getMean(self,key):
        nums = []
        for k in range(0,len(self.data[key])):
            try:
                nums.append(self.data[key][k])
            except:
                pass
        return(sum(nums)/float(len(nums)))
    def keys(self):
        return(self.data.keys())

import matplotlib
import numpy as np

data = readData("data.txt")
container = buckets()

for k in data:
    print(k)
    container.add(k[0].split("/")[0],float(k[1]))

histoBars = []
histoTicks = [int(k) for k in list(container.keys())]
histoTicks.sort()
histoTicks = [str(k) for k in histoTicks]
x = np.arange(len(histoTicks))

for k in histoTicks:
        histoBars.append(container.getMean(k))

print(len(histoBars))
print(len(histoTicks))

import matplotlib.pyplot as plt
print(histoBars)
print(histoTicks)
fig, ax = plt.subplots()
plt.bar(x, histoBars)
plt.xticks( x, histoTicks )
plt.show()

A last quick note, I'm not really sure what data format your file is, it looked like the 2 columns were seperated by 7 spaces but then one of the samples only had 6, so you might have to change the delimiter or clean the table to make sure all the rows read without error.

kpie
  • 9,588
  • 5
  • 28
  • 50