1

I am trying to download a google doc as a pdf using Selenium in Python. Unfortunately, my html knowledge is quite minimal and as a result I don't know what html I need to have it click file and then download as pdf. I realize that I can use the web developer tool to get html but that isn't working for me so well.

Here is what I have tried so far:

from selenium import webdriver 

url = ' https://docs.google.com/document/d/1Y1n-RR5j_FQ9WFMG8E_ajO0OpWLNANRu4lQCxTw9T5g/edit?pli=1' 

browser = webdriver.Firefox()
browser.get(url)

Any help would be appreciated; thanks!

hiqbal
  • 87
  • 2
  • 9
  • Hello, welcome to SO. There is probably a `button` of some kind that you need to `click()` in the html with Selenium. Are you able to post what you've written in Python so far? – Daniel Jul 16 '15 at 20:45
  • Looks like the link to that file is broken, or the URL is incomplete. – Daniel Jul 16 '15 at 20:58
  • url = ' https://docs.google.com/document/d/1Y1n-RR5j_FQ9WFMG8E_ajO0OpWLNANRu4lQCxTw9T5g/edit ' from selenium import webdriver browser = webdriver.Firefox() browser.get(url) – hiqbal Jul 16 '15 at 21:00
  • maybe you just need [requests](http://docs.python-requests.org/en/latest/), not selenium – Pynchia Jul 16 '15 at 21:38
  • I already tried that but the problem is that google docs don't like being scraped and use javascript to prevent me from scraping it conventionally which is why I need to use selenium, but thanks. – hiqbal Jul 16 '15 at 21:46

2 Answers2

1

As you mention in your comment, Google Drive doesn't like being scraped.

The drive command looks like the right tool for this sort of job. - It'll do you're trying to do, but not the way you want to do it. According to the docs (i.e. I haven't tested it), this command looks like it would download your file:

drive pull --export docx --id 1Y1n-RR5j_FQ9WFMG8E_ajO0OpWLNANRu4lQCxTw9T5g

(Also, in general, I find the easiest way to use Selenium is to use the Selenium IDE to tell Selenium what you want to do, then export the resulting test case by going to File > Export Test Case As... > Python 2 / unittest / Web Driver.)

Hope that helps.

Travis
  • 1,998
  • 1
  • 21
  • 36
0

I have a working solution, I don't know if google will update to mitigate this. Now this is in c#, but the selenium functionality is basically the same. Show all the menu items, except the download as menu and return the download as webelement. Use selenium to click it, then select a format and return the webelement to click as well. I couldn't do a click using just javascript, I was unable to figure out how to they triggered it, but clicking it using selenium driver worked just fine.

Make most of the menu's visible and return download as webelement.

  document.querySelector(`#docs-file-menu`).className = 'menu-button goog-control goog- 
  inline-block goog-control-open docs-menu-button-open-below';
  document.querySelector(`#docs-file-menu`).setAttribute('aria-expanded', 'true');
  document.querySelectorAll(`.goog-menu:not(.goog-menu-noaccel)`)[0].className = 'goog-menu goog-menu-vertical docs-material docs-menu-hide-mnemonics docs-menu-attached-button-above';
  document.querySelectorAll(`.goog-menu:not(.goog-menu-noaccel)`)[0].setAttribute('style', 'user-select: none; visibility: visible; left: 64px; top: 64px;');
  // download as
  // 2 parents above 
  document.querySelector(`[aria-label='Download as d']`).parentElement.parentElement.className = 'goog-menuitem apps-menuitem goog-submenu goog-submenu-open goog-menuitem-highlight'
  return document.querySelector(`[aria-label='Download as d']`).parentElement.parentElement;

Click download as btn:

IWebElement btn = (IWebElement)((IJavaScriptExecutor)driver).ExecuteScript(btnClickJs);
btn.Click();

Select format:

var formatCss = document.querySelectorAll(`.goog-menu.goog-menu-noaccel`)[6].querySelectorAll(`.goog-menuitem.apps-menuitem`)
                            var format = 'injectformathere' ? 'injectformathere' : '.html'

for (let i = 0; i < formatCss.length; i++) {
   if(formatCss[i].innerText.indexOf(format)!= -1)
       return formatCss[i]    
   }
return null

Click format:

btn = (IWebElement)((IJavaScriptExecutor)driver).ExecuteScript(btnClickJs);
if (btn != null)
  btn.Click();
lastlink
  • 1,505
  • 2
  • 19
  • 29