1

I use HttpClient to get the content of the pages on the internet and was faced with weird behavior.

Some sites load perfectly, but some requests fail by timeout. The thing is that the links work perfectly in the browser. E.g. I have the following link: https://www.luisaviaroma.com/en-gb/shop/women/shoes?lvrid=_gw_i4. I can open it in the browser, but my code doesn't work:

var httpClient = new HttpClient();
var response = await httpClient.GetAsync("https://www.luisaviaroma.com/en-us/sw/women?lvrid=_gw");

What can be the cause of it? Probably the issue is with the _ symbol? How should I fix it then?

I also tried to use 3rd party libraries like RestSharp but got the same result.

The exception is:

System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
 ---> System.TimeoutException: The operation was canceled.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..
 ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request.
   --- End of inner exception stack trace ---
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token)
   at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter)
   at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer)
   at System.Net.Http.HttpConnection.InitialFillAsync(Boolean async)
   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   --- End of inner exception stack trace ---
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at Suits.Scheduler.SchedulerHostedService.UpdateClothesLinksAsync(SuitsDbContext dbContext) in C:\Repo\server\src\Suits.Scheduler\SchedulerHostedService.cs:line 98
Grigory Zhadko
  • 1,484
  • 1
  • 19
  • 33
  • 1
    Normally in cases like this, it's because you're missing one or more headers that the server is expecting, so the trick is to look at a request which works in your browser and copy the headers into your request. But I tried that here with no luck. – canton7 Apr 04 '22 at 08:35
  • Your url inside the code and your browser url are not the same. Which one do you actually wanna fetch? – TheTanic Apr 04 '22 at 09:02
  • @TheTanic I wanted to check if the page exists or not. So, I check the response code and analyze the content of the HTML page for keywords. – Grigory Zhadko Apr 04 '22 at 11:24
  • So check out my answer. this should do the work – TheTanic Apr 04 '22 at 11:24
  • Are you able to load the page in your browser straight after getting a timeout? I wonder whether you're being firewalled for sending too many invalid requests... – canton7 Apr 04 '22 at 12:00
  • 1
    I was able to open the page before, during, and after the request sent by the code. I also thought about a firewall or captcha. But it seems that the problem is with something else. – Grigory Zhadko Apr 04 '22 at 12:10

2 Answers2

2

I got a 403 Error, when not setting appropriate values for the Header-Accept field.
The result is a text/html, so you need to add the appropriate Header:

HttpRequestMessage msg = new HttpRequestMessage(
  HttpMethod.Get,
  "https://www.luisaviaroma.com/en-gb/shop/women/shoes?lvrid=_gw_i4"
);
msg.Headers.Add("Accept", "text/html");
HttpClient client = new HttpClient();
var response = client.SendAsync(msg).Result;

EDIT:
In the given case, OP needs to add the Accept-Encoding header two. The answer of D A pointed this out. Code to add the field:

msg.Headers.Add("Accept-Encoding", "br");
TheTanic
  • 1,510
  • 15
  • 29
1

This is the proper request that will return a 200 OK code:

        HttpRequestMessage msg = new HttpRequestMessage(HttpMethod.Get,"https://www.luisaviaroma.com/en-us/sw/women?lvrid=_gw");
        msg.Headers.Add("Accept", "text/html");
        msg.Headers.Add("accept-encoding", "gzip, deflate, br");            
        HttpClient client = new HttpClient();
        var response1 = client.SendAsync(msg).Result;

The response is send compressed and that is why you have issues with it.

D A
  • 1,724
  • 1
  • 8
  • 19
  • `msg.Headers.Add("accept-encoding", "gzip, deflate, br")` -- don't do this, you're inviting a compressed response, but you won't try and decompress it. [Use this instead](https://stackoverflow.com/a/27327208/1086121) – canton7 Apr 04 '22 at 10:27
  • In addition, I think the 'accept-encoding' field is maybe best practice, but not really need to solve OPs problem. – TheTanic Apr 04 '22 at 10:39
  • My answer is just to give him a hint why he get the exception and not to solve the future problems he will have with compression. – D A Apr 04 '22 at 10:42