0

I have a csv file (excel spreadsheet) of a column of roughly a million numbers in column A. I want to make a histogram of this data with the frequency of the numbers on the y-axis and the number quantities on the x-axis. I'm using pandas to do so. My code:

import pandas as pd

pd.read_csv('D1.csv', quoting=2)['A'].hist(bins=50)

Python isn't interpreting 'A' as the column name. I've tried various names to reference the column, but all result in a keyword error. Am I missing a step where I have to assign that column a name via python which I don't know how to?

Code Man
  • 105
  • 1
  • 2
  • 14
Daniel Hodgkins
  • 61
  • 1
  • 1
  • 6
  • If you save it to a `DataFrame`, such as `df = pd.read_csv('D1.csv', quoting=2)`, then looking at `print df.head()` or `print df.columns` will tell you the column names that pandas is discovering. If those seem wrong, you can try to alter the `header` argument for `read_csv` to see if it is due to skipping a header row. – ely Oct 11 '14 at 01:33
  • When I do print df.head() it appears as a column consisting of (0,1,2,3,etc.) indicating the row number and a column of my actual first few numbers of data. when I do print df.columns it says: Index([u'2903.1'], dtype='object') where 2903.1 is my first number of data. I have no idea how to interpret this honestly because I'm very new to programming. – Daniel Hodgkins Oct 11 '14 at 01:51
  • That suggests that either there is no header row in the spreadsheet, but it is still trying to interpret the first row (of data) as if it was a header; or that if there is a header row, it is being inadvertently skipped. If you open the raw file (or cat the file's first few rows of content) do you see a header row? If so, you can call the `read_csv` function with an argument `header=0`. If this doesn't work, it might mean your data file doesn't actually have a header line. In that case, you can pass a list of the names, like `names=['A', 'B', ...]` and it will use those names. – ely Oct 11 '14 at 01:57
  • So what I tried was giving my excel sheet a header row and verifying that excel recognized the header row. For my column of data, I labeled it in the header row as data. I then added the header = 0 to my argument and used the name 'Data' in my code. but it still appears as a keyerror. It is very possible I'm misinterpreting your advice though since I don't have experience with this – Daniel Hodgkins Oct 11 '14 at 02:35
  • I tried using u'2903.1' as the name of the column and no key error came up but no graph appeared either – Daniel Hodgkins Oct 11 '14 at 02:48

2 Answers2

0

I need more rep to comment, so I put this as answer. You need to have a header row with the names you want to use on pandas. Also if you want to see the histogram when you are working from python shell or ipython you need to import pyplot

import matplotlib.pyplot as plt
import pandas as pd

pd.read_csv('D1.csv', quoting=2)['A'].hist(bins=50)
plt.show()
0

Okay I finally got something to work with headings, titles, etc.

import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('D1.csv', quoting=2)
data.hist(bins=50)
plt.xlim([0,115000])
plt.title("Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

My first problem was that matplotlib is necessary to actually show the graph as stated by @Sauruxum. Also, I needed to set the action

pd.read_csv('D1.csv', quoting=2)

to data so I could plot the histogram of that action with

data.hist

Basically, the problem wasn't finding the name to the header row. The action itself needed to be .hist .Thank you all for the help.

Daniel Hodgkins
  • 61
  • 1
  • 1
  • 6