3

I am beginner in python, currently working on a small project with Python. I want to build a dynamic script for patent research for patentsview.org.

Here is my code:

import urllib.parse
import urllib.request

#http://www.patentsview.org/api/patents/query?q={"_and":
[{"inventor_last_name":author},{"_text_any":{"patent_title":[title]}}]}&o=
{"matched_subentities_only": "true"}
author = "Jobs"
andreq = "_and"
invln = "inventor_last_name"
text = "_text_any"
patent = "patent_title"
match = "matched_subentities_only"
true = "true"
title = "computer"
urlbasic = "http://www.patentsview.org/api/patents/query"
patentall = {patent:title}
textall = {text:patentall}
invall = {invln:author}
andall = invall.copy()
andall.update(textall)
valuesq = {andreq:andall}
valuesqand = {andreq:andall}
valuesq = {andreq:valuesqand}
valueso = {match:true}

#########
url = "http://www.patentsview.org/api/patents/query"
values = {"q":valuesq,
          "o":valueso}
print(values)


data = urllib.parse.urlencode(values)
print(data)
############
data = data.encode("UTF-8")
print(data)
req = urllib.request.Request(url,data)
resp = urllib.request.urlopen(req)
respData = resp.read()
saveFile = open("patents.txt", "w")
saveFile.write(str(respData))
saveFile.close()

I think I got the right start for the dynamic URL - but the encoding seems to give me a HTTP Error 400: Bad request. If i dont encode, the url will be like www.somethingsomething.org/o:{....} which obviously produces an error. Here is the error:

Traceback (most recent call last):
  File "C:/Users/Max/PycharmProjects/KlayerValter/testen.py", line 38, in 
<module>
resp = urllib.request.urlopen(req)
File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
return opener.open(url, data, timeout)
  File "C:\Python34\lib\urllib\request.py", line 469, in open
response = meth(req, response)
  File "C:\Python34\lib\urllib\request.py", line 579, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 507, in error
return self._call_chain(*args)
  File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 587, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

Process finished with exit code 1

If I encode, i get the same error since all brackets get converted. The API of patentsview works as follows:

http://www.patentsview.org/api/patents/query?q={"_or":[{"_and":
[{"inventor_last_name":"Whitney"},{"_text_phrase":{"patent_title":"cotton 
gin"}}]},{"_and":[{"inventor_last_name":"Hopper"},{"_text_all":
{"patent_title":"COBOL"}}]}]}

For dynamic programming I had to come up with all the library names. If there is also a better solution, please help.

Best Regards.

General Grievance
  • 4,555
  • 31
  • 31
  • 45
Max
  • 31
  • 5
  • 2
    Suggestion: use the [requests](http://docs.python-requests.org/en/master/) library instead of fighting with urllib -- it'll make your life a lot easier. – Christian Ternus Sep 05 '17 at 15:59

2 Answers2

2

The api accepts and returns json data, so you should use json.dumps to encode your post data. Then use json.loads on the response if you want a dictionary, or just write to file.

from urllib.request import Request, urlopen
import json

url = "http://www.patentsview.org/api/patents/query"
author = "Jobs"
title = "computer"
data = {
    'q':{
        "_and":[
            {"inventor_last_name":author},
            {"_text_any":{"patent_title":title}}
        ]
    }, 
    'o':{"matched_subentities_only": "true"}
}
resp = urlopen(Request(url, json.dumps(data).encode()))
data = resp.read()
#data = json.loads(data)

As suggested by Christian, you could simply use requests, it's much better than urllib.

data = requests.post(url, json=data).json()

As for all those variables in your code, they compose a dictionary like the one below:

values = {"q":{andreq:{andreq:{invln:author, text:{patent:title}}}}, "o":{match:true}}

I don't see why you would go through all that trouble to build a dictionary but i could be wrong. However you could wrap your code in a function with author and title as arguments.


With requests you don't have to use json.dumps on your data, just use the json parameter. If you want to save the response content to file you should use the content or text attribute.
import requests

title = "computer" 
author = "Jobs" 
url = "http://www.patentsview.org/api/patents/query" 
data = { 
    "q":{ "_and":[ {"inventor_last_name":author}, {"_text_any":{"patent_title":title}}] }, 
    "o":{"matched_subentities_only":"true"} 
} 
resp = requests.post(url, json=data) 
with open("patents.txt", "w") as f:
    f.write(resp.text)
t.m.adam
  • 15,106
  • 3
  • 32
  • 52
  • " data = requests.post(url, json=data).json() " This does not work. resp = requests.post(url, json=data).json() saveFile = open("patents.txt", "w") saveFile.write(str(resp)) saveFile.close() Gives me this: ValueError: Expecting value: line 1 column 1 (char 0) – Max Sep 06 '17 at 18:00
  • The `ValueError` exception means that `.json()` failed to decode the response. Please make sure to pass valid data in the `json` parameter. If you update your post with your current code, i could take a look at it. – t.m.adam Sep 07 '17 at 00:32
  • import requests import json title = "computer" author = "Jobs" url = "http://www.patentsview.org/api/patents/query" data = { "q":{ "and":[ {"inventor_last_name":author}, {"_text_any":{"patent_title":title}}] }, "o":{"matched_subentities_only":"true"} } var = json.dumps(data) resp = requests.post(url,data=var) saveFile = open("patents.txt", "w") saveFile.write(str(resp)) saveFile.close() saveFile = open("patents.txt", "w") saveFile.write(str(resp)) saveFile.close() – Max Sep 07 '17 at 05:18
  • You've mistyped `"_and"`. That causes a 500 ERROR response which is obviously not a json response, so `.json()` fails. I've updated my post give it a try. – t.m.adam Sep 07 '17 at 05:53
  • Thanks - after rewriting, I have overseen this typo. Ok - I get the Response 200 now - everything is okay according to the server - but where or how do I get the data (e.q. the patents )? The text file only reads " response 200". – Max Sep 08 '17 at 18:12
  • You can get the [response content](http://docs.python-requests.org/en/master/user/quickstart/#response-content) with `resp.text`. Why don't you use the last snippet in my answer? – t.m.adam Sep 09 '17 at 02:29
  • 1
    Thanks alot! - It was a long night. Case closed! – Max Sep 09 '17 at 08:27
1

As an alternative to PatentsView, take a look at patent_client! It's a python module that searches the live USPTO and EPO databases using a Django-style API. This includes the Patent Examination Data Set that backs the PatentsView API. The results from any query can then be cast into pandas DataFrames or Series with a simple .to_pandas() call.

from patent_client import USApplication

result = USApplication.objects.filter(first_named_inventor="<Name>")

# Returns an iterator of application objects matching the value.
# You can also go directly to a Pandas dataframe with:

result.to_pandas()

A great place to start is the User Guide Introduction

Patent Client Logo

PyPI | GitHub | Docs

(Full disclosure - I'm the author and maintainer of patent_client)

Parker Hancock
  • 111
  • 1
  • 2