1

I'm writing a script in python's borb package to set the 'keywords' information from a list.

from borb.io.read.types import Name, String

...

info= doc["XRef"]["Trailer"]["Info"]

metadata = ['foo bar', 'baz', 'qux']
keywords: list = []

for phrase in metadata :
  clean: list[str] = do_remove_stopwords(phrase)
  keywords.append(" ".join(clean))

info[Name("Keywords")] = String(" ".join(keywords))

The result, however, is

<rdf:Description
  ...
  pdf:Keywords="foo bar, baz, qux">
     <dc:subject>
        <rdf:Bag>
        <rdf:li>foo</rdf:li>
        <rdf:li>bar</rdf:li>
        <rdf:li>baz</rdf:li>
        <rdf:li>qux</rdf:li>
        </rdf:Bag>
     </dc:subject>

and the table

2 0 obj
<</Keywords (foo bar, baz, qux)
endobj

The PDF Document Properties in Acrobat DC displays the 'Keywords' as:

Keywords: "foo bar, baz"; "foo,"; bar; baz

How do I get the bag to not split the strings out, similar to this:

     <dc:subject>
        <rdf:Bag>
        <rdf:li>foo bar</rdf:li>
        <rdf:li>baz</rdf:li>
        <rdf:li>qux</rdf:li>
        </rdf:Bag>
     </dc:subject>

and the keywords to propagate appropriately?

Ian
  • 738
  • 5
  • 13

0 Answers0