1

I'd like to scrape real-time data from a website and i decided to use webSocket - sharp library. My problem is that with the same code i can parse the data from a specific website and i can't from another one.

The program throws this exception: WebSocket.connect:0|WebSocketSharp.WebSocketException: Not a WebSocket handshake response.

using (var wss = new WebSocket("wss://..."))
{
    wss.SslConfiguration.EnabledSslProtocols = System.Security.Authentication.SslProtocols.Tls12;
    wss.Origin = "https://www.blabla.com";
           
    wss.CustomHeaders = new Dictionary<string, string>
    {
        { "Accept-Encoding", "gzip, deflate, br" },
        { "Accept-Language", "el-GR,el;q=0.9,en;q=0.8" },
        { "Cache-Control", "no-cache" },
        { "Connection", "Upgrade" },
        { "Host", "blabla.com" },
        { "Origin", "https://www.bla.com" },
        { "Pragma", "no-cache" },
        //{ "Sec-WebSocket-Key", secWebSocketKey },
        //{ "Sec-WebSocket-Protocol", "zap-protocol-v1" },
        { "Sec-WebSocket-Extensions", "permessage-deflate; client_max_window_bits" },
        { "Sec-WebSocket-Version", "13" },
        { "Upgrade", "websocket" },
        { "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36" }
     };

     //wss.OnOpen += Ws_OnOpen;
     wss.OnMessage += (sender, e) => Console.WriteLine($"Server: {e.Data}");
     wss.OnError += (sender, e) => Console.WriteLine($"Error: {e.Message}");

     wss.Connect();

     Console.ReadKey();
 }

I tried with or without custom headers.

What have i do to make a valid handshake?

(P.S: I can parse the data without custom headers from the first website)

UPDATE

In the URL there is a uid parameter wss://blabla.com/zap/?uid=5829062969032768

This uid changes in every refresh of webpage. I think it's necessary for the handshake. Is there any way to reproduce it?

ggeorge
  • 1,496
  • 2
  • 13
  • 19
  • Does the second website support websocket connections? You can't use a websocket to any random page on any website - the server _also_ needs to want that connection to be a websocket, rather than a normal web request. – James Thorpe Sep 25 '20 at 08:47
  • @James Thorpe Yes it supports. I can see the stream from Chrome. The data that client sends and the received data from server – ggeorge Sep 25 '20 at 08:49
  • OK - in that case it'll be down to a mismatch in your request in some fashion. Are you able to see the actual response the server is sending to your code - might tell you why it's refusing it? Or dig into the websocket request in Chrome - see if it's sending other headers (perhaps a needed cookie etc?). Or worst case use fiddler/wireshark etc to compare your request to the one the website itself uses. – James Thorpe Sep 25 '20 at 08:53
  • @JamesThorpe I used all the request headers as they appear in google chrome inspector – ggeorge Sep 25 '20 at 09:01
  • So any more details available in the exception when it happens then? It ought to show what the response from the server actually was somewhere I think. If not it's off to fiddler to compare... – James Thorpe Sep 25 '20 at 09:03

1 Answers1

1

This uid changes every time the page loads. I found that this site uses code obfuscation so it was too difficult for me to underastand the js code so i used selenium 4 devtools and finally scrape real-time data.

First have to initialize chrome devtools

public async static Task<DevToolsSession> InitializeChromeDevTools(IWebDriver driver)
{
    var devTools = driver as IDevTools;
    var output = devTools.CreateDevToolsSession();
    await output.Network.Enable(new OpenQA.Selenium.DevTools.Network.EnableCommandSettings());

    return output;
}

And then

var session = await ChromeDriverSettings.InitializeChromeDevTools(driver);
session.Network.WebSocketFrameReceived += Network_WebSocketFrameReceived; 

private static void Network_WebSocketFrameReceived(object sender, OpenQA.Selenium.DevTools.Network.WebSocketFrameReceivedEventArgs e)
{
    var message = e.Response.PayloadData;
}
ggeorge
  • 1,496
  • 2
  • 13
  • 19
  • I am unable to use your InitializeChromeDevTools method. The output.Network property is not available. I have installed the nuget package Selenium.WebDriver v4.0.0-alpha07. Any hints? – Petter T Jan 07 '21 at 09:12
  • 1
    @PetterT Try with -alpha05. I just updated to alpha07 and have the same problem. Something probably changed in this new version – ggeorge Jan 07 '21 at 09:31
  • Excellent @ggeorge, that made the difference. By the way: are you also able to send Websocket messages by using devtools? (Not much documentation for these alpha releases around :-) ) – Petter T Jan 07 '21 at 09:40
  • 1
    By the way, the output.Network.Enable method is async, so you should await it. – Petter T Jan 07 '21 at 09:44
  • @PetterT Thanks for the tip. I will update my answer. About websocket messages i don't know because i used it only as proxy to capture the network traffic – ggeorge Jan 07 '21 at 09:50
  • @PetterT I have made a quick research about sending messages via devtools and i found this one https://gist.github.com/sahajamit/c2d6827736f2b267a8d08c41e559ad24#gistcomment-3221979 . You could give a try with something similar because this is in Java – ggeorge Jan 07 '21 at 10:28
  • Thanks @ggeorge! I decided to not rely upon alpha software in the end, and opened the websocket connection using a websocket library instead. – Petter T Jan 07 '21 at 13:59