2

Yesterday i tried to write a simple rss downloader in Haskell wtih hte help of the Network.HTTP and Feed libraries. I want to download the link from the rss item and name the downloaded file after the title of the item.

Here is my short code:

import Control.Monad
import Control.Applicative
import Network.HTTP
import Text.Feed.Import
import Text.Feed.Query
import Text.Feed.Types
import Data.Maybe
import qualified Data.ByteString as B
import Network.URI (parseURI, uriToString)

getTitleAndUrl :: Item -> (Maybe String, Maybe String)
getTitleAndUrl item = (getItemTitle item, getItemLink item)

downloadUri :: (String,String) -> IO ()
downloadUri (title,link) = do
  file <- get link
  B.writeFile title file
    where
      get url = let uri = case parseURI url of
                      Nothing -> error $ "invalid uri" ++ url
                      Just u -> u in
                simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody

getTuples :: IO (Maybe [(Maybe String, Maybe String)])
getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)

I reached a state where i got a list which contains tuples, which contains name and the corresponding link. And i have a downloadUri function which properly downloads the given link to a file which has the name of the rss item title.

I already tried to modify downloadUri to work on (Maybe String,Maybe String) with fmap- ing on get and writeFile but failed with it horribly.

  • How can i apply my downloadUri function to the result of the getTuples function. I want to implement the following main function

    main :: IO ()
    main = some magic incantation donwloadUri more incantation getTuples

  • The character encoding of the result of getItemTitle broken, it puts code points in the places of the accented characters. The feed is utf8 encoded, and i thought that all haskell string manipulation functions are defaulted to utf8. How can i fix this?

Edit:

Thanks for you help, i implemented successfully my main and helper functions. Here comes the code:

downloadUri :: (Maybe String,Maybe String) -> IO ()
downloadUri (Just title,Just link) = do
  item <- get link
  B.writeFile title item
    where
      get url = let uri = case parseURI url of
                      Nothing -> error $ "invalid uri" ++ url
                      Just u -> u in
                simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
downloadUri _ = print "Somewhere something went Nothing"

getTuples :: IO (Maybe [(Maybe String, Maybe String)])
getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> decodeString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)

downloadAllItems :: Maybe [(Maybe String, Maybe String)] -> IO ()
downloadAllItems (Just feedlist) = mapM_ downloadUri $ feedlist
downloadAllItems _ = error "feed does not get parsed"

main = getTuples >>= downloadAllItems

The character encoding issue has been partially solved, i put decodeString before the feed parsing, so the files get named properly. But if i want to print it out, the issue still happens. Minimal working example:

main = getTuples
pasja
  • 365
  • 4
  • 10
  • I tried running your code, but I'm encountering some trouble with the proxy here at work, so I won't be able to help with the decoding issue until I get home. And even then, it may require someone else's expertise. The UTF8 issue is very different from the `Maybe` issue. What I suggest is that you start a new question for the character encoding issue, and post your minimal working example. That will help to get the right people looking the problem. Also, you might try invoking `simpleHTTP (getRequest "http://index.hu/24ora/rss/")` in GHCi to see how the characters are coming in. – mhwombat Jun 13 '13 at 12:14

2 Answers2

2

It sounds like it's the Maybes that are giving you trouble. There are many ways to deal with Maybe values, and some useful library functions like fromMaybe and fromJust. However, the simplest way is to do pattern matching on the Maybe value. We can tweak your downloadUri function to work with the Maybe values. Here's an example:

downloadUri :: (Maybe String, Maybe String) -> IO ()
downloadUri (Just title, Just link) = do
  file <- get link
  B.writeFile title file
    where
      get url = let uri = case parseURI url of
                      Nothing -> error $ "invalid uri" ++ url
                      Just u -> u in
                simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
downloadUri _ = error "One of my parameters was Nothing".

Or maybe you can let the title default to blank, in which case you could insert this just before the last line in the previous example:

downloadUri (Nothing, Just link) = downloadUri (Just "", Just link)

Now the only Maybe you need to work with is the outer one, applied to the array of tuples. Again, we can pattern match. It might be clearest to write a helper function like this:

downloadAllItems (Just ts) = ??? -- hint: try a `mapM`
downloadAllItems Nothing = ??? -- don't do anything, or report an error, or...

As for your encoding issue, my guesses are:

  1. You're reading the information from a file that isn't UTF-8 encoded, or your system doesn't realise that it's UTF-8 encoded.
  2. You are reading the information correctly, but it gets messed up when you output it.

In order to help you with this problem, I need to see a full code example, which shows how you're reading the information and how you output it.

pasja
  • 365
  • 4
  • 10
mhwombat
  • 8,026
  • 28
  • 53
1

Your main could be something like the shown below. There may be some more concise way to compose these two operations though:

main :: IO ()
main = getTuples >>= process
       where
           process (Just lst) = foldl (\s v -> do {t <- s; download v}) (return ()) lst 
           process Nothing = return ()
           download (Just t, Just l) = downloadUri (t,l)
           download _ = return ()
Ankur
  • 33,367
  • 2
  • 46
  • 72