2

There are great instructions in a number of places to import a Google Sheet into a Pandas DataFrame using gspread, eg:

# Open our new sheet and read some data.
worksheet = gc.open_by_key('...').sheet1

# get_all_values gives a list of rows.
rows = worksheet.get_all_values()

# Convert to a DataFrame and render.
import pandas as pd
df = pd.DataFrame.from_records(rows)
df.head()

The problem is that this import treats the first row as a value rather than as a header.

How can I import the DataFrame and treat the first row as column names instead of values?

emmby
  • 99,783
  • 65
  • 191
  • 249

2 Answers2

4

You can do

row=[[1,2,3,4]]*3
pd.DataFrame.from_records(row[1:],columns=row[0])
   1  2  3  4
0  1  2  3  4
1  1  2  3  4
BENY
  • 317,841
  • 20
  • 164
  • 234
  • This works, but when I randomize the data using `mydf = df.reindex(np.random.permutation(mydf.index))`, the column names get lost. This doesn't happen with data imported via read_csv(). What gives? – emmby Jun 05 '19 at 00:35
  • @emmby I am not sure this cause the problem or not `mydf = df.reindex(np.random.permutation(mydf.index))` here should `mydf.reindex` not `df.reindex` right ? – BENY Jun 05 '19 at 01:02
  • I just figured out the same thing, thank you for the help debugging! – emmby Jun 05 '19 at 01:05
  • @emmby also , if you would like permutation, I think we can do `df.sample(len(df))` – BENY Jun 05 '19 at 01:06
2

I had the same issue - instead of getting a list of lists via get_all_values(), I changed it to get_all_records() and the column headers populated just fine.

for sheet in sheet_names:
   worksheet = sh.worksheet(sheet)
   rows = pd.DataFrame(worksheet.get_all_records())
   database = pd.concat([database, rows])
Alicia
  • 121
  • 1
  • 12