1

I'm trying to create a metro application with schedule of subjects for my university. I use HAP+Fizzler for parse page and get data.

Schedule link give me @Too many automatic redirections@ error. I found out that CookieContainer can help me, but don't know how implement it.

        CookieContainer cc = new CookieContainer();
        request.CookieContainer = cc;

My code:

            public static HttpWebRequest request;
    public string Url = "http://cist.kture.kharkov.ua/ias/app/tt/f?p=778:201:9421608126858:::201:P201_FIRST_DATE,P201_LAST_DATE,P201_GROUP,P201_POTOK:01.09.2012,31.01.2013,2423447,0:";
    public SampleDataSource()
    {

        HtmlDocument html = new HtmlDocument();
        request = (HttpWebRequest)WebRequest.Create(Url);
        request.Proxy = null;
        request.UseDefaultCredentials = true;
        CookieContainer cc = new CookieContainer();
        request.CookieContainer = cc;
        html.LoadHtml(request.RequestUri.ToString());
        var page = html.DocumentNode;

String ITEM_CONTENT = null;
foreach (var item in page.QuerySelectorAll(".MainTT")) 
{
    ITEM_CONTENT = item.InnerHtml;
}
      }

With CookieContainer i don't get error, but DocumentNode.InnerHtml for some reason get value of my URI, not page html.

Maks Martynov
  • 460
  • 3
  • 18

3 Answers3

1

You just need to change one line.

Replace

 html.LoadHtml(request.RequestUri.ToString());

with

 html.LoadHtml(new StreamReader(request.GetResponse().GetResponseStream()).ReadToEnd());

EDIT

First mark your method as async

request.CookieContainer = cc;
var resp = await request.GetResponseAsync();
html.LoadHtml(new StreamReader(resp.GetResponseStream()).ReadToEnd());
L.B
  • 114,136
  • 19
  • 178
  • 224
  • Check, maybe you have some `GetResponseAsync` etc. I can't test it in Metro app right now. I tested above code in Win7 and works. The keypoint here is you have to get the response stream and reat it. – L.B Nov 28 '12 at 18:19
  • I deal with it. But GetResponseAsync don't have GetResponseStream() or something similar to this. – Maks Martynov Nov 28 '12 at 18:22
  • @MaksMartynov `Async` methods returns `Task`s. I guess what you see is its methods. See the edit. – L.B Nov 28 '12 at 18:33
0

If You want to download web page code try use this method(by using HttpClient):

public async Task<string> DownloadHtmlCode(string url)
    {
        HttpClientHandler handler = new HttpClientHandler { UseDefaultCredentials = true, AllowAutoRedirect = true };
        HttpClient client = new HttpClient(handler);
        HttpResponseMessage response = await client.GetAsync(url);                  
        response.EnsureSuccessStatusCode();
        string responseBody = await response.Content.ReadAsStringAsync();
        return responseBody;
    }
0

If you want to parse your downloaded htmlcode you can use Regex or LINQ. I have some example by using LINQ to parse html code but before you should load your code into HtmlDocument by using HtmlAgilityPack library. Then you can load by that way: html.LoadHtml(temphtml); When you'll do that, you can parse your HtmlDocument:

//This is for img links parse-example:
IEnumerable<HtmlNode> imghrefNodes = html.DocumentNode.Descendants().Where(n => n.Name == "img");
foreach (HtmlNode img in imghrefNodes)
{
   HtmlAttribute att = img.Attributes["src"];
   //in att.Value you can find your img url
   //Here you can do everything what you want with all img links by editing att.Value
}