-1

I'm coding a new python script that need to extract data from google sheets, but there are many cells which are merged/combined, and only the top-left cell from this merge has the value. It's important to have that value on all the merged cells.

How can I do that?

Python 3.8.5 + gspread 3.6.0

Note: every comment "trying to get...", the code right below it should return the same value as the previous code.

Spreadsheet test: https://docs.google.com/spreadsheets/d/17Dyxufu1y1ouBCPkf5Y7Vt1UW70WroK0Moy_DD7bZKc/edit?usp=sharing

Code for reproducing the problem:

import gspread
from oauth2client.service_account import ServiceAccountCredentials
import os
import pprint

here = os.path.dirname(os.path.abspath(__file__))
secret = os.path.join(here, 'credentials.json')

scope = ['https://spreadsheets.google.com/feeds']
creds = ServiceAccountCredentials.from_json_keyfile_name(secret, scope)

client = gspread.authorize(creds)

sheet = client.open_by_key('17Dyxufu1y1ouBCPkf5Y7Vt1UW70WroK0Moy_DD7bZKc')
ws = sheet.sheet1
pp = pprint.PrettyPrinter()

#getting the FIRST text
result = ws.acell('A1')
pp.pprint('A1: '+result.value)

#trying to get the SAME text on the cell col+1
result = ws.acell('A2')
pp.pprint('A2: '+result.value)

#getting the 'simple_cell'
result = ws.acell('C2')
pp.pprint('C2: '+result.value)

#getting the 'row_merged'
result = ws.acell('D2')
pp.pprint('D2: '+result.value)

#trying to get 'row_merged' on row+1
result = ws.acell('E2')
pp.pprint('E2: '+result.value)

#getting the 'col_merged'
result = ws.acell('D6')
pp.pprint('D6: '+result.value)

#trying to get 'col_merged' on col+1
result = ws.acell('D7')
pp.pprint('D7: '+result.value)

The output is like this:

('A1: just to confirm, the value "row_merged" has been put to cell D2 originally. Value "col_merged" is in D6 initially. Since it's merged, the expected result should be found on all merged cells')
'A2: '
'C2: simple cell'
'D2: row_merged'
'E2: '
'D6: col_merged'
'D7: '
PS C:\Users\joaov\Desktop>

The point is: A2 must be equals to A1... E2=D2, D7==D6... But it seems there's no way of handling merged cells with gspread.

Tanaike
  • 181,128
  • 11
  • 97
  • 165
João Casarin
  • 674
  • 2
  • 9
  • 27
  • 2
    Is there any chance you could post a [mre]? – Random Davis Nov 18 '20 at 21:04
  • @RandomDavis sure the test spreadsheet is: https://docs.google.com/spreadsheets/d/17Dyxufu1y1ouBCPkf5Y7Vt1UW70WroK0Moy_DD7bZKc/edit#gid=0 the code is: https://hastebin.com/umuvecajir.apache output: https://hastebin.com/wuliqihasa.rust Note: every comment "trying to get...", the code right below it should return the same value as the previous code. – João Casarin Nov 19 '20 at 00:53
  • I should clarify that your example should be self-contained, i.e. not hosted on an external site like you have. Same goes for the output, it has to be in your question itself. It should be some Python code that we can paste into an IDE, run it, and see the issue. If you have to fake or "mock" any incoming data, that's fine, as long as it shows us the same issue. I recommend taking the [tour] to get a better idea of the format this site wants questions to be in. – Random Davis Nov 19 '20 at 18:43
  • 1
    I'm sorry for that, @RandomDavis , I didn't think of editing my main question with the codes and etc.. I'm just gonna edit it right now, I hope you can help me with the issue :) – João Casarin Nov 19 '20 at 21:52
  • That's good, since I or someone else might be able to help in that case. – Random Davis Nov 19 '20 at 21:58
  • Just finished editing, now it really seems really better to understand.. My apologies. – João Casarin Nov 19 '20 at 22:03
  • Can I ask you about your question? 1. Can I ask you about the output you expect? Unfortunately, I cannot understand about `every comment "trying to get...", the code right below it should return the same value as the previous code.`. 2. I cannot open your shared Spreadsheet. Can you confirm it again? – Tanaike Nov 19 '20 at 23:52
  • You can always ask anything :D 1. Well, you can see that there are "blocks" in the code, and each block has a "pp.pprint" to show in the console the output. On the blocks that have the comment starting with "trying to get...", these blocks needs to return the same thing as the previous block, or it should. That's the merge cell problem. 2. I'm sorry for that, I forgot to share the spreadsheet with everyone, you can try again, it should open. thanks :) – João Casarin Nov 20 '20 at 00:14

1 Answers1

1

I believe your goal as follows.

  • You want to retrieve the values from the merged cells using gspread of python.

In your sample Spreadsheet, for example, the cells "A1:L12" are merged. Under this condition, when the values are retrieved from the merged cells of "A1:L12" using Sheets API, just to confirm, the value "row_merged" has been put to cell D2 originally. Value "col_merged" is in D6 initially. Since it's merged, the expected result should be found on all merged cells is retrieved only cell "A1". So in order to retrieve the same values from all cells of "A1:L12", it is required to put the values to the cells "A2:L12". Unfortunately, it seems that there are no methods for directly achieving this situation in Sheets API. So, in this answer, I would like to propose this using a script.

By the way, about result = ws.acell('A2') in your script, this cell is not merged and it's empty. So in this case, I think that the empty value is correct. I thought that you might have wanted to check the cell "B1". In this answer, this is also considered.

Sample script:

spreadsheet_id = '###'  # Please set the Spreadsheet ID.
sheet_name = 'Sheet1'  # Please set the sheet name.

client = gspread.authorize(credentials)
access_token = client.auth.token
url = "https://sheets.googleapis.com/v4/spreadsheets/" + \
    spreadsheet_id + "?fields=sheets&ranges=" + sheet_name
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
obj = res.json()
# print(obj['sheets'][0]['merges'])

sheet = client.open_by_key(spreadsheet_id)
ws = sheet.worksheet(sheet_name)

# 1. All values are retrieved.
values = ws.get_all_values()

# 2. Put the values to the merged cells.
if 'merges' in obj['sheets'][0].keys():
    for e in obj['sheets'][0]['merges']:
        value = values[e['startRowIndex']][e['startColumnIndex']]
        rows = len(values)
        if rows < e['endRowIndex']:
            for i in range(0, e['endRowIndex'] - rows):
                values.append([''])
        for r in range(e['startRowIndex'], e['endRowIndex']):
            cols = len(values[r])
            if cols < e['endColumnIndex']:
                values[r].extend([''] * (e['endColumnIndex'] - cols))
            for c in range(e['startColumnIndex'], e['endColumnIndex']):
                values[r][c] = value

# For A1
print('A1: '+values[0][0])

# For B1
# output: just to confirm, the value "row_merged" has been put to cell D2 originally. Value "col_merged" is in D6 initially. Since it's merged, the expected result should be found on all merged cells
print('B1: '+values[0][1])

# For C2
# output: simple cell
print('C2: '+values[1][2])

# For D2
# output: row_merged
print('D2: '+values[1][3])

# For E2
# output: row_merged
print('E2: '+values[1][4])

# For D6
# output: col_merged
print('D6: '+values[5][3])

# For D7
# output: col_merged
print('D7: '+values[6][3])

Note:

  • In this sample script, the method of "spreadsheets.get" in Sheets API is used with requests using the access token retrieved from client = gspread.authorize(credentials) of gspread.
  • In this sample script, the values are processed in the list. So when you retrieved the values from the merged cells, please retrieve them from the list values.

Reference:

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • Thanks in advance for your help! But could I ask something? On your explanation, you meant `A1:L1` instead of `A1:A12`, right? And also, analyzing the script, it seems I would need to get the merge range URL right? It's not viable needing to get this url everytime :/ – João Casarin Nov 20 '20 at 12:53
  • Also, this feature will be implemented with the other issue you helped me, to open the worksheet by its GID – João Casarin Nov 20 '20 at 17:27
  • @João Casarin Thank you for replying. I have to apologize for my poor English skill. Unfortunately, I cannot understand about your replying. If my answer was not useful for your situation, could you please tell me about the detail of it? By this, I would like to confirm it. – Tanaike Nov 20 '20 at 23:18
  • Don't worry about the english, it's perfectly fine, I also don't speak english natively. You said that in my spreadsheet there is a merge in the range `A1:A12`, but it's actually `A1:L1`, right? And analyzing your script, I think I'll need to get the merged cells URL, will I? Just like this: https://docs.google.com/spreadsheets/d/17Dyxufu1y1ouBCPkf5Y7Vt1UW70WroK0Moy_DD7bZKc/edit#gid=0&range=D6:D8 If yes, this would be difficult, since it should not be an input, got it? – João Casarin Nov 21 '20 at 00:41
  • @João Casarin Thank you for replying. Yes. I misunderstood it. I apologize for this. That is "A1:L12". So I updated my answer. And, I have to apologize for my poor English skill. Unfortunately, I cannot understand about `And analyzing your script, I think I'll need to get the merged cells URL, will I?`. In above sample script, the URL includes the Spreadsheet ID and sheet name. When the merged cells are found, the values are retrieved. – Tanaike Nov 21 '20 at 01:10
  • @João Casarin For the sheet which doesn't include the merged cells, I updated my sample script. Could you please confirm it? – Tanaike Nov 21 '20 at 01:18
  • Hello, your sample script really works, and again, thanks for that! However, as you helped me on other post, we need to assume the only input I have is the worksheet ID... The user theoretically does not know the worksheet name (in your sample script that would be the `sheet_name` variable. – João Casarin Nov 21 '20 at 02:19
  • I tried to implement your sample script of getting the worksheet by its ID with this sample script, but the variable `url` seems to use the worksheet NAME. Can it be changed? – João Casarin Nov 21 '20 at 02:20
  • Well, I figured out that a worksheet object has the attribute `title`, so with that I can use both your codes. It seems to be working perfectly! For now, I can't confirm with my official code with the official spreadsheet because I can't access it right now. So, I won't finish this post yet, I'll need to do that by monday, okay? – João Casarin Nov 21 '20 at 02:33
  • @João Casarin Thank you for replying. If my answer didn't resolve your issue, I apologize. – Tanaike Nov 21 '20 at 05:37
  • yeah, apparently it's working fine! I appreciate your effort for helping me, thanks! Have a good day! – João Casarin Nov 23 '20 at 16:25
  • @João Casarin Thank you for replying. I'm glad your issue was resolved. – Tanaike Nov 23 '20 at 22:14