2

I need to push 1.18 MB or approximately 10,000 rows of data from a csv file residing on my server to a Google Sheet for Tableau to read.

The data is coming in from Google DFP into a csv document. I have been using the gspread library to update google sheets with the csv data before, however with 10,000 records and the ~30 seconds to post per record this approach will not be valid.

Is there a faster way to copy the contents of a csv/txt file to a google spreadsheet than by using the gspread library? Preferably with Python.

Update: I'm trying this approach of bulk updating the cells.

raw_dfp = pd.read_csv('live_dfp_UTF.csv', error_bad_lines=False)
sample5 = raw_dfp.iloc[:3, :]
rows, col = sample5.shape

doc.resize(1, 7)
doc.clear()
doc.resize(rows + 1, col)

column_names = ['', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K']

cell_range = 'A1:' + column_names[col] + str(rows)

cells = doc.range(cell_range)

# To use this next line??
# flattened_data = np.flatten(sample5)

for x in range(rows):
    cells[x].value = sample5[x].decode('utf-8')

doc.update_cells(cells)
Artem Yevtushenko
  • 704
  • 1
  • 9
  • 21
  • 1
    In case the suggested answer doesn't work: 30 seconds seems a long time for a single record. However, it's the API call that adds a lot of overhead and not so much what data you pass. Depending on the number of columns, pushing it in chunks of ~200 rows at a time used to take almost negligible additional time over pushing a single row for me. There is an upper limit in how much you can push in a single update, however. – roganjosh Mar 07 '17 at 15:37
  • I wrote an answer [here](http://stackoverflow.com/questions/34640175/can-i-write-whole-lines-in-google-spreadsheets-using-gspread-in-python/34837122#34837122) for doing just that which might help. – roganjosh Mar 07 '17 at 15:39
  • @roganjosh could you take a look at the update in put in the body of the question? I'm having trouble making sense of what to do after the flatten line. – Artem Yevtushenko Mar 08 '17 at 05:17
  • Which part specifically is causing problems? `cells` is a 1D structure, while the spreadsheet is 2D. So, you first need to flatten the spreadsheet data to 1D and then run the `for x in range(rows)`. It looks like `cells[x].value = sample5[x].decode('utf-8')` should be `cells[x].value = flattened_data[x].decode('utf-8')`. – roganjosh Mar 08 '17 at 07:11
  • Also, it would be `flattened_data = sample5.flatten()`, I missed that. You need to get a 2D numpy array out of your DataFrame first. My initial answer on the other question assumed Python lists. It might be that you need to call `as_matrix` on the DataFrame, then flatten it. Sorry, not hugely familiar with numpy/pandas. – roganjosh Mar 08 '17 at 07:18

3 Answers3

3

Why don't you load all the data from the csv file into a pandas dataframe and then push it to google spreadsheets as a whole?

This library may be helpful:https://github.com/maybelinot/df2gspread

Shivam Gaur
  • 1,032
  • 10
  • 17
1

you could try pygsheets it uses api v4 which is faster than v3 . it also supports pushing data from pandas dataframe.

Nithin
  • 5,470
  • 37
  • 44
0

I agree with Shivam, the best way to transfer google spreadsheet to python pandas and vice versa would be by using df2gspread: http://df2gspread.readthedocs.io/en/latest/overview.html#usage1

It takes 10 minutes to set up and only 3 lines of code to do the trick:

from df2gspread import gspread2df as g2d
df = g2d.download(gfile="your_spreadsheet_ID", col_names=True, row_names=True)

I just set this up so if you have any questions, feel free to ask.

Adam Schroeder
  • 748
  • 2
  • 9
  • 23