2

I have to handle rather big XML files and I want to use the streaming API of xml-conduit to go through them and extract the info I need. In my case using streaming xml-conduit is especially appealing because I don't need much data from these files, and I need to perform simple aggregations on it so conduits are perfect.

Now, I don't always know the exact structure of the file. Files are generated by different versions of (sometimes buggy) software around the world so I can't impose the schema.

I know, however, elements that I am interested in, and their shapes. But, as I said, these elements can be located in different order with other elements, etc.

What I need, I guess, is just to skip all the elements I am not interested in and only to consider ones that want.

I initially wanted to write something like that:

tagName "person" (requireAttr "age" <* ignoreAttrs) <|> ignoreTag (const True)

but it wouldn't compile because ignoreType returns Maybe ()

What would be the way to skip all the "unknown" tags when using xml-conduit streaming API?

Alexey Raga
  • 7,457
  • 1
  • 31
  • 40
  • `fmap (maybe Nothing (const Nothing)) (ignoreTag ..) :: Maybe b` for any type `b`. – user2407038 Feb 16 '17 at 05:01
  • Yes, I tried something like that too, still isn't what I need because if I return `Nothing` the whole parser fails. I need something like a filter I guess. – Alexey Raga Feb 16 '17 at 05:59
  • There is nothing special about `Maybe` - it doesn't indicate failure of the 'parser' , in particular, failure is indicate with the `MonadThrow m` constraint on `ConduitM i o m r`. So if your parser is failing, it is the logic you wrote yourself which causes the failure. You've used something like `many` to repeat the parser, in which case you need to use something other than `Maybe` to indicate logical failure in your code, as `xml-conduit` has already 'taken' `Maybe` for its own uses (`Maybe (Maybe X)` will work, then change `Nothing` -> `Just Nothing` and `catMaybes` on the result). – user2407038 Feb 16 '17 at 06:54
  • Could you post (relevant parts of) your code and some example input? – unhammer Feb 20 '17 at 09:42

1 Answers1

1

As proposed here

λ> runConduit $ Text.XML.Stream.Parse.parseLBS def  "<foo>bar</foo><person age=\"25\">Michael</person><person age=\"2\">Eliezer</person>" .| many_ (choose [takeTree "person" ignoreAttrs, ignoreAnyTreeContent]) .| manyYield parsePerson .| Data.Conduit.List.consume 
[Person 25 "Michael",Person 2 "Eliezer"]
palik
  • 2,425
  • 23
  • 31