Print the count of every occurence of the text in each row and column from the dataset

Question

I have the following image: Please look at it for reference.

http://i58.tinypic.com/33219hh.png

What I am trying to get is for the code to read every value in each column, and tell me how many times that "number" and "letter" is there in the column. In other words, what is the occurrence of that "number" and "letter" in their respective columns?

Here is my code:

import xlrd,xlwt

ws = 'C://Users/Jack/Desktop

extract=[]
wb1 = xlrd.open_workbook(ws + 'try.xlsx')
sh1 = wb1.sheet_by_index(0)

for a in range(0,sh1.nrows):
    for b in range(0,sh1.ncols):
        extract.append(sh1.cell(a,b).value)
#print(extract)
print()
print('4:',extract.count('4'))

Output is 4: 0

I am only trying to count the number 4 from the first column because I do not know how to count everything from every column at once. The output is supposed to read 4: 3. But, I want to know how to read everything at once as mentioned earlier above.

You could possibly use [```zip```](https://docs.python.org/3/library/functions.html#zip) to transpose the rows and columns then use [```collections.Counter```](https://docs.python.org/3/library/collections.html#collections.Counter) to tally the results. Or just use ```collections.Counter``` with the ```col``` attribute of the ```Sheet``` object. — wwii, Sep 11 '15 at 02:32

georgeofallages · Answer 1 · 2015-09-11T02:32:09.230

0

Try using a dictionary for each column:

for col in range(0,sh1.ncols):
    counts = {}
    for row in range(1, sh1.nrows): #start with 1 to skip header
        val = sh1.cell(row,col).value
        if val not in counts:
            counts[val] = 0
        counts[val] += 1
    extract.append(counts)

# to get the total number of 4s in the 1st column (index 0) run:
print "4:%d" % extract[0][4]

edited Sep 11 '15 at 02:32

answered Sep 11 '15 at 02:27

georgeofallages

504
3
9

What do u want me to do with the old code. Delete what and keep what ? – Jack Sep 11 '15 at 02:35
Keep everything before your for loop. And then replace your for loop with mine. – georgeofallages Sep 11 '15 at 02:36
The answer comes out to 1 instead of 3 for the count of 4. Also, I want to do it for the complete dataset I have on that picture link. – Jack Sep 11 '15 at 02:39
Strange, are you sure you did "counts[val] += 1" instead of "counts[val] = 1"? I checked using your dataset and it worked for me. This will also work for the complete dataset. extract[0][] will show you the counts of any number and extract[1][] will show you the counts of any letter. – georgeofallages Sep 11 '15 at 02:44
I want to display the count for every number in "numbers" and for every letter in "letters", and that also at once. – Jack Sep 11 '15 at 02:49

score 0 · Answer 2 · answered Sep 11 '15 at 02:34

0

You can check out pandas. The solution can be something like this:

import pandas as pd
df = pd.io.excel.read_excel(your_file_name)
print df.icol(0).value_counts()

answered Sep 11 '15 at 02:34

hilberts_drinking_problem

11,322
3
22
51

galaxyan · Answer 3 · 2015-09-11T03:30:38.773

0

  import xlrd,xlwt

def printDate(res):
    for k,v in sorted(res.items(),key = lambda (k,v):(v,k),reverse= True):
        print('{} : {}'.format(k,v))
    print()

ws = 'C:\Users\galaxyan\Desktop\\'
wb1 = xlrd.open_workbook(ws + 'Book1.xlsx')
sh1 = wb1.sheet_by_index(0)
letterRes, numRes = {},{}
for a in range(1,sh1.nrows):
    numValue = sh1.cell(a,0).value
    letterValue = sh1.cell(a,1).value
    numRes[numValue] = numRes.get(numValue,0) + 1
    letterRes[letterValue] = letterRes.get(letterValue,0) + 1

printDate(letterRes)
printDate(numRes)

output:

B : 4
E : 3
D : 3
A : 3
C : 1

3.0 : 4
5.0 : 3
4.0 : 3
2.0 : 2
1.0 : 2

edited Sep 11 '15 at 03:30

answered Sep 11 '15 at 02:35

galaxyan

5,944
2
19
43

Yes. That is correct.That does fix my problem for printing the count for 4. But, my original question is to do the same thing for the whole dataset I have. How can I do that ? – Jack Sep 11 '15 at 02:45
res is dict and stores all the frequency – galaxyan Sep 11 '15 at 02:47
Awesome!! But, I am getting an error. --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in () 13 res[letterValue] = res.get(letterValue,0) + 1 14 ---> 15 for k,v in res.iteritems(): 16 print('{} : {}'.format(k,v)) AttributeError: 'dict' object has no attribute 'iteritems' – Jack Sep 11 '15 at 02:54
try again. I think you are using python 3 – galaxyan Sep 11 '15 at 02:56
Still an error. Yes. I am using python 3. Also, I would also want the output to be arranged. The "numbers" should be all together in the column when printed, and the "letters" should be all together. – Jack Sep 11 '15 at 02:56
you could sort the key of dict then use sorted key to find the value – galaxyan Sep 11 '15 at 03:01
Now a new error. --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in () 9 res[letterValue] = res.get(letterValue,0) + 1 10 ---> 11 sortedKey= sorted(res.keys()) 12 for k in sortedKey: 13 print('{} : {}'.format(k,res[k])) TypeError: unorderable types: str() < float() – Jack Sep 11 '15 at 03:02
Idk why, but you could convert all the keys of dict to str then sort – galaxyan Sep 11 '15 at 03:05
I honestly do not know what that means. Please, help me get this to work. You have already a lot and appreciate that. – Jack Sep 11 '15 at 03:08
I got it to work. Last thing. I still want those titles to print. So, "numbers" should print above the numbers, and "letters" over letters with a space in between numbers and letters. – Jack Sep 11 '15 at 03:13
There should be the word "Numbers" above 1.0 : 2, 2.0 : 2, and so on. – Jack Sep 11 '15 at 03:20
Also, can the letter or number with the highest count be at the top, and then the smaller number than that goes below it, and so on , and then the lowest count number is at the bottom. So, in our case for "Numbers" , 3.0: 4 should be at the top, then 4.0 & 5.0, and at the end 1.0 and 2.0. And, if i add more columns then? – Jack Sep 11 '15 at 03:23
File "", line 2 for k,v in sorted(res.items(),key = lambda(k,v):(v,k),reverse= True): ^ SyntaxError: invalid syntax – Jack Sep 11 '15 at 03:33
it works for me, but i am using python 2. To sort the dict may be different in those two version. – galaxyan Sep 11 '15 at 03:40
Any alternative solution? – Jack Sep 11 '15 at 03:44
Please help me out here. This is the last thing to do. – Jack Sep 12 '15 at 02:59

score 0 · Answer 4 · answered Sep 11 '15 at 02:57

Feed each column to collections.Counter

import collections, xlrd
import xlrd
wb = xlrd.open_workbook('test.xls')
sh = wb.sheet_by_index(0)
columns = []
for i in xrange(sh.ncols):
    columns.append(collections.Counter(sh.col_values(i)))

format_str = 'column {}: {}'
for n, column in enumerate(columns):
    print(format_str.format(n, column))
>>> 
column 0: Counter({u'a': 3, u'b': 2, u'c': 1, u'd': 1})
column 1: Counter({u'c': 2, u'b': 2, u'd': 2, u'a': 1})
column 2: Counter({u'c': 4, u'a': 1, u'b': 1, u'd': 1})
>>>

Print the count of every occurence of the text in each row and column from the dataset

4 Answers4