How to get readable unicode string from single bibtex entry field in python script

Question

Suppose you have a .bib file containing bibtex-formatted entries. I want to extract the "title" field from an entry, and then format it to a readable unicode string.

For example, if the entry was:

@article{mypaper,
    author = {myself},
    title = {A very nice {title} with annoying {symbols} like {\^{a}}}
}

what I want to extract is the string:

A very nice title with annoying symbols like â

I am currently trying to use the pybtex package, but I cannot figure out how to do it. The command-line utility pybtex-format does a good job in converting full .bib files, but I need to do this inside a script and for single title entries.

score 0 · Accepted Answer · answered Aug 06 '20 at 16:02

Figured it out:

def load_bib(filename):
    from pybtex.database.input.bibtex import Parser
    parser = Parser()
    DB = parser.parse_file(filename)
    return DB

def get_title(entry):
    from pybtex.plugin import find_plugin
    style = find_plugin('pybtex.style.formatting', 'plain')()
    backend = find_plugin('pybtex.backends', 'plaintext')()
    sentence = style.format_title(entry, 'title')
    data = {'entry': entry,
            'style': style,
            'bib_data': None}
    T = sentence.f(sentence.children, data)
    title = T.render(backend)
    return title

DB = load_bib("bibliography.bib")
print ( get_title(DB.entries["entry_label"]) )

where entry_label must match the label you use in latex to cite the bibliography entry.

score 0 · Answer 2 · answered May 20 '21 at 14:16

Building upon the answer by Daniele, I wrote this function that lets one render fields without having to use a file.


from io import StringIO
from pybtex.database.input.bibtex import Parser
from pybtex.plugin import find_plugin

def render_fields(author="", title=""):
    """The arguments are in bibtex format. For example, they may contain
    things like \'{i}. The output is a dictionary with these fields
    rendered in plain text.

    If you run tests by defining a string in Python, use r'''string''' to
    avoid issues with escape characters. 
    """
    
    parser = Parser()
    istr = r'''
    @article{foo,
            Author = {''' + author + r'''},
            Title = {''' + title + '''},
            }
     '''
    bib_data = parser.parse_stream(StringIO(istr))

    style = find_plugin('pybtex.style.formatting', 'plain')()

    backend = find_plugin('pybtex.backends', 'plaintext')()
    entry = bib_data.entries["foo"]
    data = {'entry': entry, 'style': style, 'bib_data': None}

    sentence = style.format_author_or_editor(entry)
    T = sentence.f(sentence.children, data)
    rendered_author = T.render(backend)[0:-1] # exclude period

    sentence = style.format_title(entry, 'title')
    T = sentence.f(sentence.children, data)
    rendered_title = T.render(backend)[0:-1] # exclude period

    return {'title': rendered_title, 'author': rendered_author}

How to get readable unicode string from single bibtex entry field in python script

2 Answers2