I was wondering if I can write a Haskell program to check updates of some novels on demand, and the website I am using as an example is this. And I got a problem when displaying the contents of it (on a mac el capitan). The simple codes follow:
import Network.HTTP
openURL :: String -> IO String
openURL = (>>= getResponseBody) . simpleHTTP . getRequest
display :: String -> IO ()
display = (>>= putStrLn) . openURL
Then, when I run display "http://www.piaotian.net/html/7/7430/"
on ghci, some strange characters appear; the first lines look like this:
<title>×ß½øÐÞÏÉ×îÐÂÕ½Ú,×ß½øÐÞÏÉÎÞµ¯´°È«ÎÄÔĶÁ_Æ®ÌìÎÄѧ</title>
<meta http-equiv="Content-Type" content="text/html; charset=gbk" />
<meta name="keywords" content="×ß½øÐÞÏÉ,×ß½øÐÞÏÉ×îÐÂÕ½Ú,×ß½øÐÞÏÉÎÞµ¯´° Æ®ÌìÎÄѧ" />
<meta name="description" content="Æ®ÌìÎÄÑ§ÍøÌṩ×ß½øÐÞÏÉ×îÐÂÕ½ÚÃâ·ÑÔĶÁ£¬Ç뽫×ß½øÐÞÏÉÕ½ÚĿ¼¼ÓÈëÊղط½±ãÏ´ÎÔĶÁ,Æ®ÌìÎÄѧС˵ÔĶÁÍø¾¡Á¦ÔÚµÚһʱ¼ä¸üÐÂС˵×ß½øÐÞÏÉ£¬Èç·¢ÏÖδ¼°Ê±¸üУ¬ÇëÁªÏµÎÒÃÇ¡£" />
<meta name="copyright" content="×ß½øÐÞÏɰæÈ¨ÊôÓÚ×÷ÕßÎáµÀ³¤²»¹Â" />
<meta name="author" content="ÎáµÀ³¤²»¹Â" />
<link rel="stylesheet" href="/scripts/read/list.css" type="text/css" media="all" />
<script type="text/javascript">
I also tried to download as a file as follows:
import Network.HTTP
openURL :: String -> IO String
openURL = (>>= getResponseBody) . simpleHTTP . getRequest
downloading :: String -> IO ()
downloading = (>>= writeFile fileName) . openURL
But after downloading the file, it is like in the photo:
If I download the page by python (using urllib for example) the characters are displayed normally. Also, if I write a Chinese html and parse it, then there seems to be no problem. Thus it seems that the problem is on the website. However, I don't see any difference between the characters of the site and those I write.
Any help on the reason behind this is well appreciated.
P.S.
The python code is as follows:
import urllib
urllib.urlretrieve('http://www.piaotian.net/html/7/7430/', theFic)
theFic = file_path
And the file is all fine and good.