9

For my work i have to find potential customers in biomedical research and industry.

I wrote some pretty handy programs using the module biopython, which has a nice interface for searching NCBI. I have also used the clinical_trials module, to search clinicaltrials.gov.

I now want to search patent databases, like EPO or USPTO, but i haven't been able to find even the slightest trace of python module. But maybe i'm missing something obvious?

Since google has a patent search option, i was wondering if there might be a python module for searching google which could be adapted to only searching patents?

JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
Misconstruction
  • 1,839
  • 4
  • 17
  • 23
  • IP Street offers a RESTful API for searching the US and European data bases. It's more up to date and more robust than other offerings. Here is there developer page: http://docs.ipstreet.com/ – Reed Jessen Aug 11 '16 at 17:15

3 Answers3

13

You can parse at least the USPTO using any XML parsing tool such as the lxml python module.

There is a great paper on doing just this by Gabe Fierro, available here: Extracting and Formatting Patent Data from USPTO XML (no paywall)

Gabe also participated in some useful discussion on doing this here on this google group.

Finally, if you know what you're looking for and have plenty of disk space you can also get the bulk data stored locally for processing. USPTO bulk downloads here.

Any more specific questions please let me know! I've trod some of this ground before :)

Also, the Google Patent search API is deprecated but you can now do those same searches through the main Google search API using URL tags (I don't have them handy but you can find them with a search via Google patents which will be responded to by google.com).

UPDATE: At home now, the flag you want to use the google custom search API for patent searching is &tbm=pts - please note that the google custom search engine and getting a code for same is hugely beneficial for patent searching because the JSON delivered has a nice data structure with patent-specific fields.

Example Code:

import requests
import urllib
import time
import json

access_token = <get yours by signing up for google custom search engine api>
cse_id = <get yours by signing up for google custom search engine api>

# Build url
start=1
search_text = "+(inassignee:\"Altera\" | \"Owner name: Altera\") site:www.google.com/patents/"
# &tbm=pts sets you on the patent search
url = 'https://www.googleapis.com/customsearch/v1?key='+access_token+'&cx='+cse_id+'&start='+str(start)+'&num=10&tbm=pts&q='+ urllib.quote(search_text)

response = requests.get(url)

response.json()
f = open('Sample_patent_data'+str(int(time.time()))+'.txt', 'w')
f.write(json.dumps(response.json(), indent=4))
f.close()

This will (once you add the free API access info) grab the first ten entries of patents owned by Altera (as an example) and save the resulting JSON to a text file. Pull up your favorite web JSON editor and take a look at the JSON file. In particular I recommend looking in ['items'][] and the sub ['pagemap']. Just by parsing this JSON you can get titles, thumbnails, snippets, title, link, even citations (when relevant).

Community
  • 1
  • 1
Ezekiel Kruglick
  • 4,496
  • 38
  • 48
  • Great answer, but you can simply this code a lot by using [requests](http://docs.python-requests.org/en/latest/). – Burhan Khalid Jan 26 '15 at 04:32
  • @Burhan - Since my example uses requests I figure you mean using requests to build the query string or do the file saving operation? Sure, it could be a few one liners but that wouldn't be as useful to demonstrate clearly what's going on, which is the main goal here in a stackoverflow answer, right? – Ezekiel Kruglick Jan 26 '15 at 20:10
  • 1
    I see your point; but using requests to build the query string will automatically escape the parameters; and saving the file with requests will ensure that large response is streamed and saved correctly. I find both these extremely useful. – Burhan Khalid Jan 27 '15 at 07:50
  • 1
    @Burhan - I was aware of the escaping parameters and chose to show it instead, but I was unaware that requests streamed large responses in a special way. Thank you for teaching me something today! I'm off to go read more about how requests handles files :) – Ezekiel Kruglick Jan 27 '15 at 21:16
  • 1
    The referenced paper may have been moved to http://funginstitute.berkeley.edu/wp-content/uploads/2013/06/Extracting_and_Formatting.pdf – Teepeemm Nov 04 '15 at 20:05
  • @Teepeemm - yup, you're right. Not sure why somebody thought it was worth a downvote, link rot happens and the title and author were given to help figure it out. I updated the link in the main post, thank you very much Teepeemm. – Ezekiel Kruglick Nov 04 '15 at 20:11
0

You should take a look at patent_client! It's a python module that searches the live USPTO and EPO databases using a Django-style API. The results from any query can then be cast into pandas DataFrames or Series with a simple .to_pandas() call.

from patent_client import USApplication, Inpadoc, Patent, PublishedApplication

# USPTO databases
USApplication.objects.filter("filter criteria here")
Patent.objects.filter("filter criteria here"
PublishedApplication.objects.filter("filter criteria here")

# EPO databases
Inpadoc.objects.filter("filter criteria here")

A great place to start is the User Guide Introduction

Patent Client Logo

PyPI | GitHub | Docs

(Full disclosure - I'm the author and maintainer of patent_client)

Parker Hancock
  • 111
  • 1
  • 2
-2

I don't know about a ready-made python module, but you could build your own. For both USPTO and EPO there are APIs, found at http://www.epo.org/searching/free/ops.html and http://tsdr.uspto.gov/ .

I can't tell how easy to use the documents from there are, but you could try making a simple querier that retrieves and parses results. Of course, the more extensive the data you're after, the more work it will be to write a module.

glormph
  • 994
  • 6
  • 13
  • The TSDR site is for trademarks, not for patents. It stands for "Trademark Status & Document Retrieval." If anyone is interested in to use TSDR for trademark access (as distinguished from patent access) I have a Python module for doing that at https://github.com/codingatty/Plumage-py . Most of the documentation is done, except for the dictionary of what is returned; but the examples are pretty self-explanatory. – codingatty Feb 16 '16 at 22:46