2

Is it possible to say a webrequest to only get text-based data from a site? And if it is how should I do this?

The only thing I can imagine is to search in the response string and remove all the image-tags. But this is a very bad way to do this...

EDIT: this is my code snippet:

            string baseUrl = kvPair.Value[0];
            string loginUrl = kvPair.Value[1];
            string notifyUrl = kvPair.Value[2];
            cc = new CookieContainer();
            string loginDetails = DataCollector.GetLoginDetails(baseUrl, ref cc);
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(loginUrl);
            request.Method = "POST";
            request.Accept = "text/*";
            request.ContentType = "application/x-www-form-urlencoded; charset=UTF-8";
            request.CookieContainer = cc;
            request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36";
            Byte[] data = Encoding.ASCII.GetBytes(loginDetails);
            request.ContentLength = data.Length;
            using (Stream s = request.GetRequestStream())
            {
                s.Write(data, 0, data.Length);
            }
            HttpWebResponse res = (HttpWebResponse)request.GetResponse();
            request = (HttpWebRequest)WebRequest.Create(notifyUrl);
            request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36";
            request.CookieContainer = cc;
            res = (HttpWebResponse)request.GetResponse();
            Stream streamResponse = res.GetResponseStream();
            using (StreamReader sr = new StreamReader(streamResponse))
            {
                ViewData["data"] += "<div style=\"float: left; margin-bottom: 50px;\">" + sr.ReadToEnd() + "</div>";
            }
Farhan Nasim
  • 773
  • 4
  • 13
Snickbrack
  • 1,253
  • 4
  • 21
  • 56

2 Answers2

0

I found myself a good coding solution:

public static string StripImages(string input)
{
   return Regex.Replace(input, "<img.*?>", String.Empty);
}

this kills all images but only as soon as you have loaded all the images so there is no savings in transfered data in this solution...

Snickbrack
  • 1,253
  • 4
  • 21
  • 56
0

The HTTP/1.1 Header Field Definitions' section 14.1 contains the Accept header definition. It states the following:

... If an Accept header field is present, and if the server cannot send a response which is acceptable according to the combined Accept field value, then the server SHOULD send a 406 (not acceptable) response.

So it is up to the server if it respects the client's request.

I have found that most of the servers ignore the Accept header. So far I have found only one exceptoin: it is GitHub. I requested the GitHub homepage with audio as the Accept parameter. And it responded appropriately with response code 406.

Try the following snippet for a demo, you should get System.Net.WebException: The remote server returned an error: (406) Not Acceptable.

HttpWebRequest request = (HttpWebRequest) WebRequest.Create("https://github.com/");
request.Method = "GET";
request.Accept = "audio/*";

var response = request.GetResponse();
Farhan Nasim
  • 773
  • 4
  • 13