I'm writing a script in python's borb
package to set the 'keywords' information from a list.
from borb.io.read.types import Name, String
...
info= doc["XRef"]["Trailer"]["Info"]
metadata = ['foo bar', 'baz', 'qux']
keywords: list = []
for phrase in metadata :
clean: list[str] = do_remove_stopwords(phrase)
keywords.append(" ".join(clean))
info[Name("Keywords")] = String(" ".join(keywords))
The result, however, is
<rdf:Description
...
pdf:Keywords="foo bar, baz, qux">
<dc:subject>
<rdf:Bag>
<rdf:li>foo</rdf:li>
<rdf:li>bar</rdf:li>
<rdf:li>baz</rdf:li>
<rdf:li>qux</rdf:li>
</rdf:Bag>
</dc:subject>
and the table
2 0 obj
<</Keywords (foo bar, baz, qux)
endobj
The PDF Document Properties in Acrobat DC displays the 'Keywords' as:
Keywords: "foo bar, baz"; "foo,"; bar; baz
How do I get the bag
to not split the strings out, similar to this:
<dc:subject>
<rdf:Bag>
<rdf:li>foo bar</rdf:li>
<rdf:li>baz</rdf:li>
<rdf:li>qux</rdf:li>
</rdf:Bag>
</dc:subject>
and the keywords to propagate appropriately?