1

(Please bear a moment with the following little story, before I ask my question!)

I have some SVG element generated by MathJax that, after generation, looks like so (as found in the element inspector):

<svg xmlns:xlink="http://www.w3.org/1999/xlink" width="6.768ex" height="2.468ex" viewBox="0 -825.2 2914.1 1062.4">
    <defs>...</defs>
    <g>...</g>
</svg>

When I try to display this SVG on its own in chrome or safari, the browser displays the following error message:

This XML file does not appear to have any style information associated with it. The document tree is shown below. [...]

After some experimentation, I found that the culprit is a missing 'xmlns' tag. (I guess MathJax puts another SVG higher up in the page that has this tag, so inside the web page, it doesn't need to be repeated a second time. Or something.) Namely, changing the opening <svg> tag to this allows the SVG to be displayed on its own by the browser:

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="6.768ex" height="2.468ex" viewBox="0 -825.2 2914.1 1062.4">
    <defs>...</defs>
    <g>...</g>
</svg>

(Note the new xmlns attribute.)

OK. Good.

Now I want to automate this task of adding the missing xmlns tag. I want to use the python lxml utility for this.

Unfortunately (finally coming to my question!), lxml seems to hide all attributes that start with 'xmlns' and I don't know why. While it allows me to add the 'xmlns' attribute (e.g., by doing

root.attrib['xmlns'] = "http://www.w3.org/2000/svg"

where root is the root <svg> tag of the document), I cannot test if the 'xmlns' attribute is already there or not, and actually if I run the script twice on the same file this results in two separate xmlns tags being added, which in turn causes lxml to complain and crash.

So: (i) why is lxml hiding certain attributes from me, and (ii) regardless of that how can I add the xmlns tag only if it isn't there already? (Of course I could manually parse the file, but I'd like a self-contained solution using lxml.)

Labrador
  • 623
  • 7
  • 13
  • 1
    `xmlns="..."` is not really an attribute; it is a namespace declaration. Therefore it is not found in `attrib`. – mzjn Apr 06 '18 at 05:38
  • 1
    Perhaps the problem is that you are copying this XML subtree from the MathJax output using a textual copy/paste operation, and it is this copying action that is losing the in-scope namespaces? In which case the answer might be to find an XML-aware way of extracting the SVG subtree? – Michael Kay Apr 06 '18 at 08:22
  • The problem is that MathJax does not generate valid SVG, at least valid from a browser POV. Without the namespace that qualifies all tag and attribute names, the browser does not know how to handle the document. The tag `svg` without the SVG namespaces can mean anything. So the question really is: How can MathJax be made to produce SVG that the browser understands? –  Apr 06 '18 at 09:51
  • @MichaelKay The copy-paste operation starts with opening `` tag and ends with closing `` tag. Info is not lost there. The problem is: when I use `lxml` to load an xml document, I have found no way of discovering whether the root element already contains an `xmlns` attribute or not... – Labrador Apr 07 '18 at 02:35
  • 1
    @LutzHorn (Funny thing is, the browser doesn't complain about MathJax's svg when it's embedded in the larger html document. It only complains when the SVG is on its own. But this is a mystery some MathJax people should undoubtedly be able to explain.) – Labrador Apr 07 '18 at 02:56
  • You can use `root.nsmap` to see what namespaces are in scope (http://lxml.de/api/lxml.etree._Element-class.html#nsmap). – mzjn Apr 07 '18 at 06:22
  • @Labroador, if you are using copy-paste on a lexical XML subtree, then you are indeed losing information: you are losing the namespace declarations (plus other significant things in the containing tree such as xml:lang and xml:space attributes). – Michael Kay Apr 07 '18 at 08:50

1 Answers1

1

I have mixed two previous answers regarding namespaces. One from lxml: add namespace to input file and another from Adding xml prefix declaration with lxml in python. The first answer does not deal with copying attributes, so I borrowed it from the second one.

from lxml import etree
from io import StringIO, BytesIO

# excerpt from https://commons.wikimedia.org/wiki/File:SVG_logo.svg
# note that xmlns is omitted
xml = '<svg xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="-50 -50 100 100"><rect id="background" x="-50" y="-50" width="100" height="100" rx="4" fill="#f90"/>  <g id="c">      <circle id="n" cy="-31.6" r="7.1" fill="#fff"/> </g></svg>'
tree = etree.parse(StringIO(xml))
root = tree.getroot()

nsmap = root.nsmap
nsmap[None] = 'http://www.w3.org/2000/svg'
root2 = etree.Element(root.tag, nsmap=nsmap)
root2[:] = root[:]
for a in root.attrib:
  root2.attrib[a] = root.attrib[a]

tree2 = etree.parse(StringIO(etree.tostring(root2, encoding="unicode")))
root3 = tree2.getroot()
print(root3)
# <Element {http://www.w3.org/2000/svg}svg at 0x58778f0>

print(root3.nsmap)
# {'xlink': 'http://www.w3.org/1999/xlink', None: 'http://www.w3.org/2000/svg'}

This will work for you, but I believe that there's a way MathJax can handle this kind of task.

Sangbok Lee
  • 2,132
  • 3
  • 15
  • 33
  • I'm not sure I understand. I can add an `xmlns` attribute already by doing `root.attrib['xmlns'] = 'http://...'`. (Though lxml will later crash if the 'xmlns' attribute was already there.) Are you saying your approach will not crash lxml in either case? What I really wanted was a way to be able to test if the 'xmlns' attribute is already in the root tag or not from within lxml... – Labrador Apr 07 '18 at 02:49
  • `etree.fromstring` etc will always strip `xmlns`. `root.attrib['xmlns'] = 'http://...'` will NOT set a namespace. Try `root.nsmap` after setting that way to verify this. Use `nsmap` instead of `attrib`. – Sangbok Lee Apr 09 '18 at 05:57
  • The output from `tostring(root2)` is good, but the element objects are still not bound to a namespace. `print(root2)` outputs `` and not ``. The result of `tostring(root2)` must be parsed to fix this. – mzjn Apr 09 '18 at 11:24
  • Good point. I modified the code. But I'm worried OP is not catching up. – Sangbok Lee Apr 09 '18 at 17:01