This question is about how to parse xml content with xmlns
attributes etc. I wrote code to parse it which works. I will appreciate pointers on whether it can be done better.
I have an XML file test.xml
as below:
<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body>
<SomeResponse xmlns="https://testsomestuff.org/API/WS/">
<SomeResult>
<html>
<head>
<title>My <b>Title</b></title>
</head>
<body>
<p>Foo bar baz</p>
</body>
</html>
</SomeResult>
</SomeResponse>
</soap:Body></soap:Envelope>
I wrote the code to parse the "SomeResult" content using xml-conduit
:
{-# LANGUAGE OverloadedStrings #-}
import Prelude hiding (readFile)
import Text.XML
import Text.XML.Cursor
import qualified Data.Text as T
import Data.Text.Lazy.Builder (toLazyText)
import Data.Text.Lazy (fromStrict)
main :: IO ()
main = do
doc <- readFile def "test.xml"
let cursor = fromDocument doc
res = fromStrict $ T.concat $ child cursor >>= laxElement "Body" >>= child >>= laxElement "SomeResponse" >>= child >>= laxElement "SomeResult" >>= descendant >>= content
pres = parseText_ def res
cursor2 = fromDocument pres
res2 = child cursor2 >>= element "head" >>= child >>= element "title" >>= descendant >>= content
print $ res2
Output in ghci
: parses correctly:
*Main> main
["My ","Title"]
Is laxElement
approach to locate the SomeResult
content good way to do it? If there is a better way, I will very much appreciate pointers on this.
Also, I need to do http encoding in reverse direction (when building a request for the response above) where the inner body is escaped (like under SomeResult
in text.xml
). Is that something that is taken care of by default when building request using Text.XML
, or do I have to convert the inner body to escaped http explicitly by using something like html-entities ?