see also https://github.com/sciunto-org/python-bibtexparser/issues/352
currently i am doing:
doi=DOI(self.doi)
meta_bibtex=doi.fetchBibtexMeta()
bd=bibtexparser.loads(meta_bibtex)
btex=bd.entries[0]
using the DOI helper class below. I was hoping to simplify my life since the citeproc result looks quite complicated and i'd love to have some cleanup in e.g. authors and titles.
The bibtexparser does a great job but i don'want a latex result but just clear text.
E.g for 10.1145/800001.811672 i get
The structure of the {\\textquotedblleft}the{\\textquotedblright}-multiprogramming system
While the plain text
The structure of the "the"-multiprogramming system
would be better for my use case. Is this already possible with the current bibtexparser or a feature request?
doi.py
'''
Created on 2023-02-12
@author: wf
'''
import urllib.request
import json
from dataclasses import dataclass
@dataclass
class DOI:
"""
get DOI data
"""
doi:str
def fetchMeta(self,headers:dict)->dict:
"""
get the metadata for my doi
Args:
headers(dict): the headers to use
Returns:
dict: the metadata according to the given headers
"""
url=f"https://doi.org/{self.doi}"
req=urllib.request.Request(url,headers=headers)
response=urllib.request.urlopen(req)
encoding = response.headers.get_content_charset('utf-8')
content = response.read()
text = content.decode(encoding)
return text
def fetchBibtexMeta(self)->dict:
"""
get the meta data for my doi by getting the bibtext JSON
result for the doi
Returns:
dict: metadata
"""
headers= {
'Accept': 'application/x-bibtex; charset=utf-8'
}
text=self.fetchMeta(headers)
return text
def fetchCiteprocMeta(self)->dict:
"""
get the meta data for my doi by getting the Citeproc JSON
result for the doi
see https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html
Returns:
dict: metadata
"""
headers= {
'Accept': 'application/vnd.citationstyles.csl+json; charset=utf-8'
}
text=self.fetchMeta(headers)
json_data=json.loads(text)
return json_data