3

I have some values in a Google Sheet, some of which are hyperlinked, like the third one here:

enter image description here

I want to retrieve the text value of each cell, and the hyperlink if present.

I am able to access the sheet easily enough with gspread:

import gspread
from oauth2client.service_account import ServiceAccountCredentials

scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name(
    './credentials.json', scope)
gc = gspread.authorize(credentials)
key = 'xxxxx'
wks = gc.open_by_key(key).worksheets()
for wk in wks:
    links = wk.col_values(3)
    for l in links:
       print l.value

However, this only prints the string value of the links, not the actual href that the link is pointing to.

Does anyone know if it's possible to retrieve this programmatically, using gspread or another library?

Richard
  • 62,943
  • 126
  • 334
  • 542

1 Answers1

2

In gspread, a Cell instance has an undocumented attribute input_value that gives you access to the formula.

>>> formula = mycell.input_value
>>> formula
'=HYPERLINK("https://url.com","Link Text")'
>>> lst = formula.split('"')
>>> lst[1], lst[3]
('https://url.com', 'Link Text')

From there you just need to split the string to remove the undesired parts.

In your case, you may want to subclass gspread.Worksheet like so:

class CustomWorksheet(gspread.Worksheet):
    def get_links(self, col):
        """Returns a list of all links urls in column `col`.
           Empty cells in this list will be rendered as :const:`None`.
        """
        start_cell = self.get_addr_int(1, col)
        end_cell = self.get_addr_int(self.row_count, col)

        row_cells = self.range('%s:%s' % (start_cell, end_cell))
        return [cell.input_value.split('"')[1] for cell in row_cells if cell.input_value.startswith('=HYPERLINK')]
Jacques Gaudin
  • 15,779
  • 10
  • 54
  • 75
  • 1
    I have used input_value but I am getting None not even the text. I think input_value is deprecated now. – Chaithanya Krishna Jul 03 '20 at 20:14
  • @ChaithanyaKrishna You are absolutely correct, this is deprecated. The correct way to do this now is using the `value_render_option` keyword argument, e.g. `worksheet.cell(1, 1, value_render_option='FORMULA')` – Jacques Gaudin Jul 03 '20 at 22:21
  • The docs for `value_render_option` are here: https://developers.google.com/sheets/api/reference/rest/v4/ValueRenderOption – Jacques Gaudin Jul 03 '20 at 22:33
  • Thanks for the response Jacques. I have tried value_render_option as well with all the three possible but it didn't worked – Chaithanya Krishna Jul 05 '20 at 20:19
  • @ChaithanyaKrishna I haven't tried it but it's the replacement stated in the source code. You should probably ask a new question showing what you have tried and didn't work. – Jacques Gaudin Jul 05 '20 at 20:26