Openpyxl - How to read only one column from Excel file in Python?

Question

I want to pull only column A from my spreadsheet. I have the below code, but it pulls from all columns.

from openpyxl import Workbook, load_workbook

wb=load_workbook("/home/ilissa/Documents/AnacondaFiles/AZ_Palmetto_MUSC_searchterms.xlsx", use_iterators=True)
sheet_ranges=wb['PrivAlert Terms']

for row in sheet_ranges.iter_rows(row_offset=1): 
    for cell in row:
        print(cell.value)

ZLNK · Answer 1 · 2018-02-20T13:11:09.623

26

this is an alternative to previous answers in case you whish read one or more columns using openpyxl

import openpyxl

wb = openpyxl.load_workbook('origin.xlsx')
first_sheet = wb.get_sheet_names()[0]
worksheet = wb.get_sheet_by_name(first_sheet)

#here you iterate over the rows in the specific column
for row in range(2,worksheet.max_row+1):  
    for column in "ADEF":  #Here you can add or reduce the columns
        cell_name = "{}{}".format(column, row)
        worksheet[cell_name].value # the value of the specific cell
        ... your tasks...

I hope that this be useful.

edited Feb 20 '18 at 13:11

answered Oct 13 '16 at 17:23

ZLNK

811
14
17

1

did you mean: `for row in range(2,worksheet.max_row+1):` instead of `for row in range(2,worksheet.max_row):`? – Vikas Prasad Feb 17 '18 at 15:41
Hi, no. Because worksheet.max_row return the highest index with elements on it, if you add +1, the last row it will be an empty line. It´s possible to check for more examples here:[openpyxl](http://openpyxl.readthedocs.io/en/default/_modules/openpyxl/worksheet/worksheet.html) – ZLNK Feb 20 '18 at 12:15
1

yeah, but for the function `range`, the second param is exclusive. This way we would miss the data from the last row – Vikas Prasad Feb 20 '18 at 12:20
Aha, you are right, in my case the last row had a different value for the column, for that reason I didn`t notice the mistake, I will make the correction now, thanks! – ZLNK Feb 20 '18 at 13:10

score 15 · Answer 2 · answered Mar 16 '17 at 02:53

15

Using openpyxl

from openpyxl import load_workbook
# The source xlsx file is named as source.xlsx
wb=load_workbook("source.xlsx")

ws = wb.active
first_column = ws['A']

# Print the contents
for x in xrange(len(first_column)): 
    print(first_column[x].value)

answered Mar 16 '17 at 02:53

Harilal Remesan

176
1
5

Using ws['A'] as to extract the column of sheet does not work. It thorws an Attribute error saying to iterate. Refer more for my answer in this thread below https://stackoverflow.com/questions/34754077/openpyxl-how-to-read-only-one-column-from-excel-file-in-python/70006361#70006361 – Priya Nov 17 '21 at 14:43

score 13 · Answer 3 · answered Dec 05 '18 at 15:15

13

In my opinion is much simpler

from openpyxl import Workbook, load_workbook
wb = load_workbook("your excel file")
source = wb["name of the sheet"]
for cell in source['A']:
    print(cell.value)

answered Dec 05 '18 at 15:15

Lorenzo

167
1
4

Using source['A'] as to extract the column of sheet does not work. It thorws an Attribute error saying to iterate. Refer more for my answer in this thread below https://stackoverflow.com/questions/34754077/openpyxl-how-to-read-only-one-column-from-excel-file-in-python/70006361#70006361 – Priya Nov 17 '21 at 14:43

score 2 · Answer 4 · answered Jan 12 '16 at 22:19

I would suggest using the pandas library.

import pandas as pd
dataFrame = pd.read_excel("/home/ilissa/Documents/AnacondaFiles/AZ_Palmetto_MUSC_searchterms.xlsx", sheetname = "PrivAlert Terms", parse_cols = 0)

If you don't feel comfortable in pandas, or for whatever reason need to work with openpyxl, the error in your code is that you aren't selecting only the first column. You explicitly call for each cell in each row. If you only want the first column, then only get the first column in each row.

for row in sheet_ranges.iter_rows(row_offset=1): 
    print(row[0].value)

This will still return one row at a time. – Charlie Clark Jan 13 '16 at 08:26 — Charlie Clark, Jan 13 '16 at 08:26

score 2 · Answer 5 · answered Jan 13 '16 at 08:25

2

Use ws.get_squared_range() to control precisely the range of cells, such as a single column, that is returned.

answered Jan 13 '16 at 08:25

Charlie Clark

18,477
4
49
55

get_squared_range() is depracted... See this post: https://stackoverflow.com/questions/42532026/get-squared-range-to-list-of-lists-from-excel – Edward Gaere Jan 05 '22 at 20:54

Compadre · Answer 6 · 2016-10-27T17:57:21.447

2

Here is a simple function:

import openpyxl

def return_column_from_excel(file_name, sheet_name, column_num, first_data_row=1):
    wb = openpyxl.load_workbook(filename=file_name)
    ws = wb.get_sheet_by_name(sheet_name)
    min_col, min_row, max_col, max_row = (column_num, first_data_row, column_num, ws.max_row)
    return ws.get_squared_range(min_col, min_row, max_col, max_row)

edited Oct 27 '16 at 17:57

answered Jul 05 '16 at 15:03

Compadre

815
1
10
18

score 1 · Answer 7 · answered Mar 24 '17 at 08:49

By using openpyxl library and Python's list comprehensions concept:

import openpyxl

book = openpyxl.load_workbook('testfile.xlsx')
user_data = book.get_sheet_by_name(str(sheet_name))
print([str(user_data[x][0].value) for x in range(1,user_data.max_row)])

It is pretty amazing approach and worth a try

score 0 · Answer 8 · answered Mar 04 '17 at 18:21

0

Using ZLNK's excellent response, I created this function that uses list comprehension to achieve the same result in a single line:

def read_column(ws, begin, columns):
  return [ws["{}{}".format(column, row)].value for row in range(begin, len(ws.rows) + 1) for column in columns]

You can then call it by passing a worksheet, a row to begin on and the first letter of any column you want to return:

column_a_values = read_column(worksheet, 2, 'A')

To return column A and column B, the call changes to this:

column_ab_values = read_column(worksheet, 2, 'AB')

answered Mar 04 '17 at 18:21

ewilan

638
6
16

ws.rows is generator and I am getting "object of type 'generator' has no len()" maybe max_row? – Michal Gonda Jun 29 '18 at 13:30
What happens when your sheet actually enough columns that there is actually a column 'AB' ? – virtualxtc Jul 11 '18 at 21:15

Priya · Answer 9 · 2021-11-17T14:44:13.833

I know I might be late joining to answer this thread. But atleast my answer might benifit someone else who might be looking to solve.

You have to iterate through the column values of the sheet. According to my opinion, one could implement like this:

from openpyxl import load_workbook

wb = load_workbook("/home/ilissa/Documents/AnacondaFiles/AZ_Palmetto_MUSC_searchterms.xlsx", read_only=True)
sheet = wb['PrivAlert Terms']
for val in sheet.iter_rows(max_col=1):
        print(val[0].value)

iter_rows loops through the rows of the specified columns. You can specify the arguments of iter_rows from min_row to max_row and also max_col. Setting max_col=1 here makes it loop through all the rows of column(column upto the maximum specified). This pulls all the values of only firstcolumn of your spreadsheet

Similarly if you want to iterate through all the columns of a row, that is in horizontal direction, then you can use iter_cols specifying the from row and till column attributes

score 0 · Answer 10 · answered Jan 04 '22 at 12:26

Updated answer from ZLNK's response :

import openpyxl
wb=openpyxl.load_workbook('file_name.xlsm')
first_sheet = wb.sheetnames
worksheet = wb[first_sheet[2]] # index '2' is user input

for row in range(2,worksheet.max_row+1):  
   for column in "E":  #Here you can add or reduce the columns
      cell_name = "{}{}".format(column, row)
      vv=worksheet[cell_name].value

Openpyxl - How to read only one column from Excel file in Python?

10 Answers10

Linked