0

From a XML string, I produced a Pandoc document using

readHtml :: (PandocMonad m, ToSources a) => ReaderOptions -> a -> m Pandoc

And producing m Text (where m belongs to the Monad class) using

writeMarkdown :: PandocMonad m => WriterOptions -> Pandoc -> m Text

Haskell source code:

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE OverloadedStrings #-}

module Main where

import Control.Monad
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as LB
import Data.Text.Encoding
import qualified Data.Text.IO as TIO
import Text.Pandoc

xmlString :: LB.ByteString
xmlString = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<html>\n  <head>\n    <meta charset=\"utf-8\"/>\n  </head>\n  <body>\n    <ul id=\"a8_boYDC\">\n      <li id=\"rp\">\n        <p>\"Alpha\"</p>\n        <ul>\n          <li id=\"cT\">\n            <p>\"Beta\"</p>\n            <ul>\n              <li id=\"wy\">\n                <p>\"Gamma\"</p>\n              </li>\n              <li id=\"Be\">\n                <p>\"Delta\"</p>\n              </li>\n              <li id=\"Ep\">\n                <p>\"Epsilon\"</p>\n              </li>\n            </ul>\n          </li>\n          <li id=\"Ko\">\n            <p>\"Zeta\"</p>\n            <ul>\n              <li id=\"AI\">\n                <p>Eta</p>\n              </li>\n            </ul>\n          </li>\n          <li id=\"kw\">\n            <p>\"Theta\"</p>\n            <ul>\n              <li id=\"sx\">\n                <p>\"Iota\"</p>\n              </li>\n              <li id=\"82\">\n                <p>\"Kappa\"</p>\n              </li>\n              <li id=\"o_\">\n                <p>\"Lambda\"</p>\n              </li>\n            </ul>\n          </li>\n        </ul>\n      </li>\n    </ul>\n  </body>\n</html>\n"

main :: IO ()
main =
  TIO.putStr
    <=< handleError
    <=< runIO
      . ( writeMarkdown def
            <=< readHtml def
              . decodeUtf8
              . B.toStrict
        )
    $ xmlString

Output:

-   "Alpha"

    -   "Beta"

        -   "Gamma"

        -   "Delta"

        -   "Epsilon"

    -   "Zeta"

        -   "Eta"

    -   "Theta"

        -   "Iota"

        -   "Kappa"

        -   "Lambda"

My questions is, how can I adjust the options in order to:

  • remove whitespace between the dash and the actual text, and

  • remove the dashes from the output.

Would be possible, for example, to pass options to writeMarkdown ?

F. Zer
  • 1,081
  • 7
  • 9
  • 1
    This kind of change shouldn't be dealt with by `writeMarkdown`, as its role is accurately rendering as Markdown whatever it is given (so the dashes correspond to the `li`s, and the whitespace within each item is preserved as is). What you probably want to do instead is modifying the `Pandoc` intermediate representation generated by `readHtml` (to get started, see the [*pandoc-types* package](https://hackage.haskell.org/package/pandoc-types-1.23)). – duplode Mar 05 '23 at 01:14
  • I have read that package documentation. Very useful. @duplode, I couldn't figure out how to transform the `BulletList` type, in order to render an indented list (without the dashes). Could you give me a hint ? – F. Zer Mar 05 '23 at 09:31
  • I can't see a suitable type. When I do that, I can use `walk` to traverse the structure, and perform the conversion. – F. Zer Mar 05 '23 at 09:31
  • Markdown doesn't have indented lists without markers. Your best bet is probably `walk`ing through the structure removing the lists, changing the list items into paragraphs, and adding indentation as non-breaking spaces (as otherwise [they will be trimmed](https://stackoverflow.com/q/40023013/2751851) in the Markdown output). That said, if Markdown output isn't a requirement, you might prefer writing a custom writer that renders plain text exactly as you want. – duplode Mar 05 '23 at 12:48
  • Thank you so much, @duplode. How can I add indentation to each of the paragraphs ? I can't see a viable option for doing that using the provided `walk` function. – F. Zer Mar 05 '23 at 15:03
  • 1
    In principle, it should be a matter of manipulating [the `Inline` contents](https://hackage.haskell.org/package/pandoc-types-1.23/docs/Text-Pandoc-Definition.html#t:Inline) of [the `Para` constructor](https://hackage.haskell.org/package/pandoc-types-1.23/docs/Text-Pandoc-Definition.html#t:Block) as you create the paragraphs. – duplode Mar 05 '23 at 17:37

0 Answers0