0

I have just posted this question, which answer came right away. It, in turn, creates the following new question:

If my understanding is correct, the StreamContent object, from HttpResponseMessage, is created upon making an HTTP request via HttpClient.GetAsync. Its Header property, or part of it, will be set according to meta tags included in the HTML source file.

For instance, a meta tag can tell the response object with which charset encode the file's contents.

<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />

Running a request to a resource that contains such line will generate a HttpResponseMessage.Content.Header with this setting.

In the other question referenced at the top of this question, I mention about a response object being created without the correct encoding. Since the HTML source that generates such incompatible response does contain the setting that is responsible for creating responses properly encoded:

<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1255">

what is the reason that responses for that site are not being passed the charset setting included in the meta tag and thus being rendered in an incorrect charset?

Here's a pictorial description of the question: both sites contain the meta tag with charset setting, but one, for some reason, misses it...

enter image description here


Fiddler's header details for both requests:

Working one: (removed cookie header)

Request:

GET http://www.ynet.co.il/home/0,7340,L-8,00.html HTTP/1.1
Host: www.ynet.co.il
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
If-Modified-Since: Thu, 31 Mar 2016 10:04:39 GMT

Response:

HTTP/1.1 200 OK
vg_id: 1
X-me: 06
Content-Type: text/html; charset=UTF-8
Last-Modified: Thu, 31 Mar 2016 10:38:57 GMT
Accept-Ranges: bytes
VX-Cache: HIT
WAI: 01
V-TTL: 0
backend-cache-control: 
Content-Length: 410685
Vary: Accept-Encoding
Date: Thu, 31 Mar 2016 10:38:48 GMT
Connection: keep-alive

Problematic one:

Request:

GET http://winedepot.co.il/ HTTP/1.1
Host: winedepot.co.il
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __utma=201832727.725995063.1458660502.1459413977.1459418530.8; __utmz=201832727.1458660502.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); __utmc=201832727; ASPSESSIONIDCQTRQCAQ=FEOHEBFCBGABBKOBAHOGKBGB
Connection: keep-alive

Response:

HTTP/1.1 200 OK
Cache-Control: private
Content-Length: 118225
Content-Type: text/html
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Thu, 31 Mar 2016 10:36:21 GMT
Community
  • 1
  • 1
Veverke
  • 9,208
  • 4
  • 51
  • 95
  • I'm fairly sure that the `HttpResponseMessage` class does _not_ parse the response HTML to read any meta tags. I might be wrong though. Are you very sure that the behavior you're seeing stems from those tags, and if so, how did you verify this? – CodeCaster Mar 31 '16 at 09:42
  • This is an assumption, however based on analyzing the results of the excerpt above. – Veverke Mar 31 '16 at 09:50
  • Yeah but you don't show the entire HTTP response, so there is no way for us to verify that the character set doesn't actually come from a response header. – CodeCaster Mar 31 '16 at 09:54
  • Which request header you think can influence here ? Don't forget that Content-Type is a response header only. I will add it to the screenshot, but I see nothing that seems related. – Veverke Mar 31 '16 at 09:57
  • I'm not talking about request headers anywhere. Don't add screenshots, add it as text. Use Fiddler to obtain the request and response headers. Also, content-type _can_ be used as a request header. – CodeCaster Mar 31 '16 at 09:57
  • Am confused, but will add data from fiddler in a minute... – Veverke Mar 31 '16 at 10:00
  • I said: don't add screenshots. Click 'Raw' and copy the text. – CodeCaster Mar 31 '16 at 10:18

2 Answers2

0

As you can see from your Fiddler screenshots, the HttpResponseMessage.Content.Headers.ContentType will contain exactly what was specified in the Content-type header of the response.

The HttpResponseMessage will not parse the response HTML and search for any <meta /> tags.

CodeCaster
  • 147,647
  • 23
  • 218
  • 272
  • Thanks, but I do not see how this answer the question. I noticed the difference in the response headers in fiddler. Why does one response header get a charset setting and the other not, when this *parameter* is defined in a meta tag in the html source - and both uris html sources do contain it ? – Veverke Mar 31 '16 at 10:33
  • @Veverke my answer answers your question _"Why do I see these content-type headers while I expect something else?"_. Your expectation is wrong. That this answer doesn't solve the underlying problem is not something I can change. – CodeCaster Mar 31 '16 at 10:42
  • *The HttpResponseMessage will not parse the response HTML*. Fine, this means these tags have no influence in the response object creation. Stil... here are go again - which other setting then is responsible for one response being created with UTF-8 and the other with none (default) ? – Veverke Mar 31 '16 at 10:42
  • I am sorry buddy but you will not tell me what my question is :-) – Veverke Mar 31 '16 at 10:43
  • I _am_ telling you the answer to what you're asking, you're simply not understanding it, which is not my problem. The `HttpResponseMessage.Content.Headers.ContentType` will contain exactly the value that the server sends in its `Content-type` response header, and I have ran out of ways to tell you that. You have no influence over that, and if that content-type header is actually wrong (i.e. the repsonse body actually is encoded differrently), then there's nothing you can do but go and detect or guess the actual encoding. – CodeCaster Mar 31 '16 at 10:44
  • So you sir want another question with "I make 2 http requests. Both responses are created, but one with a specific charset, one without. What causes this" (and this is all what was asked from the beginning...) ? – Veverke Mar 31 '16 at 10:46
  • *You have no influence over that, and if that content-type header is actually wrong (i.e. the repsonse body actually is encoded differrently), then there's nothing you can do but go and detect or guess the actual encoding.* - now you finally touched the problem - and are expressing your opinion. I appreciate that. – Veverke Mar 31 '16 at 10:47
  • Well yeah, that is an entirely different question, and that is exactly what your [previous question](http://stackoverflow.com/q/36327747/1219280) is about, right? In **this** question you're posing a misunderstanding, which I tried to explain in my answer. The fixing of this understanding does not fix the underlying problem: HTTP servers can send headers that actually do not properly describe the content they're sending, and **that** is the problem you should solve. – CodeCaster Mar 31 '16 at 10:47
  • That question is not the same as this "new" (third) one. The original question deals - "why the response here is not properly encoded". Answer: "oh, because your response is missing the charset windows-1255." Thank you. Now, you say there is no influence in the meta charset setting and the response object creation. I ask - you are telling me that the HttpClient framework is flawed in that it will not guarantee to you that your response is properly encoded - when it could, by simply relating to the meta charset setting ? – Veverke Mar 31 '16 at 10:55
  • I understand what you are saying (and can agree with that), and I then reply as above. – Veverke Mar 31 '16 at 10:56
  • One cannot "simply" read the meta tag. I'm writing an answer to your previous question right now. – CodeCaster Mar 31 '16 at 10:56
  • Thanks for that. I honestly think it fits better here, though. (argh... will we agree with something :) ?) – Veverke Mar 31 '16 at 11:27
-1

content type comes from the HTTP HEADER

https://en.wikipedia.org/wiki/List_of_HTTP_header_fields

<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />

is part of the content and not part of the headers.

I suggest you to install the application Fiddler to better understand what those request actually do. set fiddler as your proxy and use the inspectors to see what is actually passed when you make http requests.

better explanation is far from the scope here

Nahum
  • 6,959
  • 12
  • 48
  • 69
  • Did not get your point, Nahum. I am trying to figure out why one site is able to create http responses properly encoded and why others not. I gave examples of both cases. What is the reason for responses not properly encoded ? You say this has nothing to do with the meta tag ? What is the reason then ? – Veverke Mar 31 '16 at 09:53
  • By the way I was aware from the beginning that Content-Type is part of the Content headers (see code sample). – Veverke Mar 31 '16 at 11:51
  • why some people create bad code? your browser is made to take care of people not following standarts and writing bad code. thats simply what the sites return you have no control over it. youl have to work around it. – Nahum Mar 31 '16 at 14:58