1

The xml-conduit tutorial (the only one in existence, and perhaps the only Haskell XML library with a tutorial) shows how to create or read an XML document, but not how to modify one. The only way I am familiar with such operations is using lxml/elementtree (python), which only works through side-effect (that I'm aware of). I suspect a very different approach here.

Say that I have a simple document:

<html>
    <head>
        <title>My <b>Title</b></title>
    </head>
    <body>
        <p>Paragraph 1.</p>
        <p>Paragraph 2.</p>
    </body>
</html>

How to:
- Modify the title?
- Delete the first paragraph in this document?
- Append the body of this document to the body of another document?

Feel free to propose and contribute a solution using other Haskell libraries. The community could use many more examples.

pbarill
  • 640
  • 8
  • 16
  • 1
    Although the question is slightly different, the accepted answer in [this Q&A](https://stackoverflow.com/questions/37269316/haskell-xml-update-text-using-hxt-library) answers this question. – MikaelF Apr 02 '20 at 20:05
  • It's interesting (except the `lens` part, if only I was able to understand that thing and work with it), but it only partially answers this question. So I can perhaps modify a value. As for how to insert or delete data, I don't have the slightest idea what to do. – pbarill Apr 02 '20 at 20:32
  • Deleting, say, the first

    element, including the text node within, is a mere modification of the contents of

    .
    – Bjartur Thorlacius Apr 03 '20 at 11:44

2 Answers2

1

By reading the XML document and writing a new one, keeping the similarities you want but differing in the respects you desire.

Say you have a document:: Document. If you prefer record syntax over lenses, you might wind up with a solution that looks somewhat like the following. To be fair, refactoring it into small functions with descriptive names can make somewhat more readable. Alternatively, you can use lenses, a library of small, generic functions with undescript names that are useful for exactly this kind of DOM tree manipulations.

document{ documentRoot=
    (documentRoot document){ elementNodes=
        (documentRoot document
        & elementNodes
        & (\[head,NodeElement body]->
            [head,NodeElement body{elementNodes=
                [elementNodes body & last]
        }]))
    })
}
  • This doen't compile: (1) unbalanced parens; (2) `body` is a `Node` and does not have an `elementNodes` field; (3) `documentRoot` is a function, maybe you meant `documentRoot document` in lines 3 and 4? (4) `Document` and `Element` do not have a `Functor` instance, so you can't use `<&>` on them, maybe you meant `&`? (5) `Node` does not pattern-match to `[a, b]`, maybe you meant `&` here, too? (6) same comment, with `last` on line 4. I would be interested in seeing the "small functions with descriptive names" you mention, maybe it would add clarity to this answer. – MikaelF Apr 04 '20 at 00:12
  • (1-4) You're right. (5-6) I've edited the snippet to pattern match on NodeElement. Feel free to continue editing it. – Bjartur Thorlacius Apr 04 '20 at 07:38
  • But the crux is that you can insert and delete elements the same way you insert and delete characters in text nodes. By replacing a subtree (a list of NodeElements) with a modified subtree (the list of NodeElements after dropping or adding as you wish). – Bjartur Thorlacius Apr 04 '20 at 07:44
0

Another method.

from simplified_scrapy import SimplifiedDoc 
html = '''<html>
    <head>
        <title>My <b>Title</b></title>
    </head>
    <body>
        <p>Paragraph 1.</p>
        <p>Paragraph 2.</p>
    </body>
</html>'''
doc = SimplifiedDoc(html)
title = doc.title
title.setContent('Modify <b>Title</b>')
firstP = doc.body.p
firstP.repleaceSelf("")
p = doc.p
p.insertAfter(p.outerHtml)
print (doc.html)

Result:

<html>
    <head>
        <title>Modify <b>Title</b></title>
    </head>
    <body>

        <p>Paragraph 2.</p><p>Paragraph 2.</p>
    </body>
</html>

Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

dabingsou
  • 2,469
  • 1
  • 5
  • 8