1

Using the following code:

    with open('newim','wb') as f:
        f.write(requests.get(repr(url)))

where the url is:

    url = ''

I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python33\lib\site-packages\requests\api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Python33\lib\site-packages\requests\api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "C:\Python33\lib\site-packages\requests\sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Python33\lib\site-packages\requests\sessions.py", line 567, in send
    adapter = self.get_adapter(url=request.url)
  File "C:\Python33\lib\site-packages\requests\sessions.py", line 641, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)

I have seen other posts with what, at first glance, appears to be a similar problem but I haven't had any luck just adding 'https://' or anything like that...I seriously want to avoid having to do this in webdriver+Autoit or something because I have to do a similar exercise for thousands of images.

2 Answers2

2

There seems to be a problem with your understanding of the concept of embedded images. The url you have posted is, actually, what your browser returns when you select 'View Image' or 'Copy Image Location' (or something similar, depending on the browser) from the context menu, and formally is called a data URI.

It is not an http url pointing to an image, and you can not use it to retrieve actual images from any server: this is exactly what requests points out in the error message.


So, how do we get these images? The following script will handle this task:

import requests
from lxml import html
import binascii as ba

i = 0
url="<Page URL goes here>" #Ex: http://server/dir/images.html
page = requests.get(url)
struct = html.fromstring(page.text)
images = struct.xpath('//img/@src')

for img in images:
    i += 1
    ext = img.partition('data:image/')[2].split(';')[0]
    with open('newim'+str(i)+'.'+ext,'wb') as f:
        f.write(ba.a2b_base64(img.partition('base64,')[2]))

print("Done")

To run it you will need to install, along with requests, the lxml library which can be found here.


Here follows a short description of how the script functions:

First it requests the url from the server and, after it gets the server's response, it stores it in a Response object (page).

Then it utilizes html.fromstring() from lxml to transform the "textified" content of page into a tree-structure which can be processed by commands utilizing XPath syntax, like this one: images = struct.xpath('//img/@src').

The result is a list containing the contents of the src attribute of every image in the page. In this case (embedded images) these are the data URIs.

Then, for every image in the list, it first gets the image type (which will be used as the newim's extension), using partition() and split() and stores it in ext. Then it converts the base64 encoded data to binary (using a2b_base64() from binascii module) and writes the output to the file.


As a small demo, save this html code (as, eg, images.html) somewhere in your server

<h1>Images</h1>
<img src="" />  
<br />
<img src=""></img>
<br />
<img src=""/>

and point to it in the script: requests.get("http://yourserver/somedir/images.html").

When you run the script you will get the following 3 images: enter image description here, enter image description here, enter image description here, respectively named newim1.png, newim2.png and newim3.jpg.


As a reminder, do note that this script (in its current form) will only handle embedded images. If you want to process also ordinary linked images, then you have to modify it accordingly (but this is not difficult).

sokin
  • 824
  • 2
  • 13
  • 20
  • I seem to be getting rerouted to a login with the requests method...this is a bit mystifying because I should already be logged in on the webdriver object...the site i'm working on is rbauction.com – SciPyInTheHole Oct 12 '15 at 18:05
  • My answer covers the general way to handle _data uris_ with `requests`. The problem you mention is related to the fact that you have not made the session info -which is known to the `webdriver` since you are logged in- available for the `requests`. [Here](http://stackoverflow.com/a/11435762/4711309) is a possible way to do this. – sokin Oct 12 '15 at 18:24
0

This is an image encoded in base64. Quoting the URL below: "base64 equals to text (string) representation of the image itself".

Read this for a detailed explanation: http://www.stoimen.com/blog/2009/04/23/when-you-should-use-base64-for-images/

In order to use them you'll have to implement a base64 decoder. Luckily SO already provides you with the answer on how to do it:

Python base64 data decode

Community
  • 1
  • 1
Ricardo Cid
  • 199
  • 1
  • 5
  • This is the right answer...the correct function is base64.b64decode(datauri) just write the decoded string to an image file and voile...you have your image. – SciPyInTheHole Oct 12 '15 at 18:14