2

I wrote up a small RSS feed downloader in Haskell, and I'm running into a problem with this story. The RSS item for this is:

<item>
    <title>Defense lawyer says gov’t hid NSA role in California terrorism case</title>
    <link>http://feeds.arstechnica.com/~r/arstechnica/index/~3/hh41K3S-dug/</link>
    <comments>http://arstechnica.com/tech-policy/2013/06/defense-lawyer-says-govt-hid-nsa-role-in-california-terrorism-case/#comments</comments>
    <pubDate>Wed, 19 Jun 2013 17:04:16 +0000</pubDate>
    <dc:creator>Cyrus Farivar</dc:creator>
    <category><![CDATA[Law & Disorder]]></category>
    <category><![CDATA[FISA]]></category>
    <category><![CDATA[FISC]]></category>
    <category><![CDATA[NSA]]></category>
    <category><![CDATA[san diego]]></category>
    <category><![CDATA[Terrorism]]></category>

    <guid isPermaLink="false">http://arstechnica.com/?p=292287</guid>
    <description><![CDATA["We're going to evaluate our options as to what to do now," attorney says.]]></description>
    <content:encoded><![CDATA[<div id="rss-wrap"> <p>Now that the <a href="http://arstechnica.com/tech-policy/2013/06/nsa-head-says-digital-spying-has-disrupted-a-little-over-10-plots-domestically/">National Security Agency (NSA) and other law enforcement institutions have begun to pull </a><a style="font-size: 14px; line-height: 19px;" href="http://arstechnica.com/tech-policy/2013/06/nsa-head-says-digital-spying-has-disrupted-a-little-over-10-plots-domestically/">back </a><a style="font-size: 14px; line-height: 19px;" href="http://arstechnica.com/tech-policy/2013/06/nsa-head-says-digital-spying-has-disrupted-a-little-over-10-plots-domestically/">the veil on surveillance tactics</a><span style="font-size: 14px; line-height: 19px;"> and their newly disclosed relationship in suspected terrorism cases, at least one defense attorney is starting to challenge previously closed cases.</span></p>
            <p>Among the cases officials cited where NSA surveillance proved useful in securing a conviction was that of <a href="https://www.fbi.gov/sandiego/press-releases/2013/san-diego-jury-convicts-four-somali-immigrants-of-providing-support-to-foreign-terrorists">Basaaly Saeed Moalin</a>, a San Diego cab driver. Moalin was convicted in February 2013 on five counts, including conspiracy to provide material support to a foreign terrorist organization, Somali terrorist group Al Shabaab.</p>
            <p>"We're going to evaluate our options as to what to do now to get to the bottom of this," Joshua Dratel, a New York-based defense attorney representing Moalin, told <em><a href="http://www.wired.com/threatlevel/2013/06/nsa-defense-lawyers/">Wired</a></em> on Tuesday. "We can't learn about it until it's to the government's tactical advantage politically to disclose it. National security is about keeping illegal conduct concealed from the American public until you're forced to justify it because someone ratted you out."</p>
            </div><p><a href="http://arstechnica.com/tech-policy/2013/06/defense-lawyer-says-govt-hid-nsa-role-in-california-terrorism-case/#p3">Read 5 remaining paragraphs</a> | <a href="http://arstechnica.com/tech-policy/2013/06/defense-lawyer-says-govt-hid-nsa-role-in-california-terrorism-case/?comments=1">Comments</a></p><div class="feedflare">
            <a href="http://feeds.arstechnica.com/~ff/arstechnica/index?a=hh41K3S-dug:QhYtCojMxzM:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/arstechnica/index?i=hh41K3S-dug:QhYtCojMxzM:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.arstechnica.com/~ff/arstechnica/index?a=hh41K3S-dug:QhYtCojMxzM:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/arstechnica/index?i=hh41K3S-dug:QhYtCojMxzM:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.arstechnica.com/~ff/arstechnica/index?a=hh41K3S-dug:QhYtCojMxzM:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/arstechnica/index?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.arstechnica.com/~ff/arstechnica/index?a=hh41K3S-dug:QhYtCojMxzM:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/arstechnica/index?d=yIl2AUoC8zA" border="0"></img></a>
    </div><img src="http://feeds.feedburner.com/~r/arstechnica/index/~4/hh41K3S-dug" height="1" width="1"/>]]></content:encoded>
    <wfw:commentRss>http://arstechnica.com/tech-policy/2013/06/defense-lawyer-says-govt-hid-nsa-role-in-california-terrorism-case/feed/</wfw:commentRss>
    <slash:comments>0</slash:comments>
    <feedburner:origLink>http://arstechnica.com/tech-policy/2013/06/defense-lawyer-says-govt-hid-nsa-role-in-california-terrorism-case/</feedburner:origLink>
</item>

Haskell doesn't seem to like the apostrophe used in the title.

  1. My first attempt ran into an invalid character error.
  2. After explicitly setting UTF-8 on stdout, it got a little better.
  3. If I save it to a local file (copy-paste into Vim), I get a slightly different result.

None of these, however, result in the apostrophe being correctly interpreted and printed. I should note that I'm using Text.XML.Light for parsing, and the results appear to be the same if I write out to a file instead of print to the console.

Any idea why this isn't working? For reference, my code is here.

hugomg
  • 68,213
  • 24
  • 160
  • 246
damien
  • 902
  • 1
  • 7
  • 12

1 Answers1

2

The HTTP package doesn't seem to decode the bytes correctly. Here is a version that does the the decoding manually, using Data.Text's decodeUtf8, which works correctly. I'm not sure if there is a better way.

bennofs
  • 11,873
  • 1
  • 38
  • 62
  • [Here](https://gist.github.com/dradtke/5824422)'s my second attempt, and the [final result](http://i.imgur.com/FZHWGNL.png). Doesn't appear to look much better. – damien Jun 20 '13 at 16:48
  • Okay, it looks like it does work, the problem now is with cmd.exe; if I pipe the output to a file and open it in Vim, then it appears to be correct. – damien Jun 20 '13 at 16:59