23

I want to get websocket data in this page https://upbit.com/exchange?code=CRIX.UPBIT.KRW-BTC, its websocket URL is dynamic and only valid during the first connection, the second time you connect to it it will not send data anymore.

enter image description here

So I wonder that maybe headless chrome can help me to monitor the websocket data.

Any ideas? Thanks!

soulmachine
  • 3,917
  • 4
  • 46
  • 56
  • 1
    See also [Capturing Websocket Messages From Network Tab](https://stackoverflow.com/questions/57893407/capturing-websocket-messages-from-network-tab) – ggorlen Oct 07 '22 at 14:03

3 Answers3

36

You actually don't need to do anything complex on this. The URL though seems dynamic, but works fine through code as well. The reason it doesn't work is that you need to understand what is happening in the background.

First let's look at the Network Tab.

Websocket URL

The cookies and the Origin may be of importance to connecting. So we note these down.

Now let us look at the data exchanges on the socket

Starting frames

Middle frames

If you look at the frames the initial frame receives o as the data, which may indicate a opening connection. And then the website sends some data to the socket, which may be related to what we want to query. When the connection gets halted for some time, the socket receives h as the data. This may indicated a hold or something (as shown in second image)

To get the exact data we put a breakpoint in the code

Breakpoint

And then print the value in the console

data sent

Now we have enough information to hit to the coding part. I found below to be a good websocket library for this

https://github.com/websockets/ws

So we do a

yarn add ws || npm install ws --save

Now we write our code

const WebSocket = require("ws")
const ws = new WebSocket("wss://example.com/sockjs/299/enavklnl/websocket",null,{
    headers: {
        "Cookie":"<cookie data noted earlier>",
        "User-Agent": "<Your browser agent>"
    },
    origin: "https://example.com",
})
const opening_message = '["[{\\"ticket\\":\\"ram macbook\\"},{\\"type\\":\\"recentCrix\\",\\"codes\\":[\\"CRIX.UPBIT.KRW-BTC\\",\\"CRIX.BITFINEX.USD-BTC\\",\\"CRIX.BITFLYER.JPY-BTC\\",\\"CRIX.OKCOIN.CNY-BTC\\",\\"CRIX.KRAKEN.EUR-BTC\\",\\"CRIX.UPBIT.KRW-DASH\\",\\"CRIX.UPBIT.KRW-ETH\\",\\"CRIX.UPBIT.KRW-NEO\\",\\"CRIX.UPBIT.KRW-BCC\\",\\"CRIX.UPBIT.KRW-MTL\\",\\"CRIX.UPBIT.KRW-LTC\\",\\"CRIX.UPBIT.KRW-STRAT\\",\\"CRIX.UPBIT.KRW-XRP\\",\\"CRIX.UPBIT.KRW-ETC\\",\\"CRIX.UPBIT.KRW-OMG\\",\\"CRIX.UPBIT.KRW-SNT\\",\\"CRIX.UPBIT.KRW-WAVES\\",\\"CRIX.UPBIT.KRW-PIVX\\",\\"CRIX.UPBIT.KRW-XEM\\",\\"CRIX.UPBIT.KRW-ZEC\\",\\"CRIX.UPBIT.KRW-XMR\\",\\"CRIX.UPBIT.KRW-QTUM\\",\\"CRIX.UPBIT.KRW-LSK\\",\\"CRIX.UPBIT.KRW-STEEM\\",\\"CRIX.UPBIT.KRW-XLM\\",\\"CRIX.UPBIT.KRW-ARDR\\",\\"CRIX.UPBIT.KRW-KMD\\",\\"CRIX.UPBIT.KRW-ARK\\",\\"CRIX.UPBIT.KRW-STORJ\\",\\"CRIX.UPBIT.KRW-GRS\\",\\"CRIX.UPBIT.KRW-VTC\\",\\"CRIX.UPBIT.KRW-REP\\",\\"CRIX.UPBIT.KRW-EMC2\\",\\"CRIX.UPBIT.KRW-ADA\\",\\"CRIX.UPBIT.KRW-SBD\\",\\"CRIX.UPBIT.KRW-TIX\\",\\"CRIX.UPBIT.KRW-POWR\\",\\"CRIX.UPBIT.KRW-MER\\",\\"CRIX.UPBIT.KRW-BTG\\",\\"CRIX.COINMARKETCAP.KRW-USDT\\"]},{\\"type\\":\\"crixTrade\\",\\"codes\\":[\\"CRIX.UPBIT.KRW-BTC\\"]},{\\"type\\":\\"crixOrderbook\\",\\"codes\\":[\\"CRIX.UPBIT.KRW-BTC\\"]}]"]'
ws.on('open', function open() {
    console.log("opened");
});

ws.on('message', function incoming(data) {
    if (data == "o" || data == "h") {
        console.log("sending opening message")
        ws.send(opening_message)
    }
    else {
        console.log("Received", data)

    }
});

And running the code we get

Working code

Now if I replace

const ws = new WebSocket("wss://example.com/sockjs/299/enavklnl/websocket",null,{
    headers: {
        "Cookie":"<cookie data noted earlier>",
        "User-Agent": "<Your browser agent>"
    },
    origin: "https://example.com",
})

to

const ws = new WebSocket("wss://example.com/sockjs/299/enavklnl/websocket")

Which means cookies and origin was never needed as such. But I would still recommend you to use them

Tarun Lalwani
  • 142,312
  • 9
  • 204
  • 265
  • 3
    If you just want to do this using puppeteer only, then this may not be the answer you were looking for – Tarun Lalwani Jan 26 '18 at 08:11
  • 6
    Amazing answer, very detailed and enlightening analysis. I'll adopt your answer under another question https://stackoverflow.com/q/48364820/381712, for this question I still want to know how to dump data via Puppeteer or headless Chrome – soulmachine Jan 27 '18 at 19:42
  • Is there a way to do this with `puppeteer` only? – four-eyes Mar 08 '19 at 09:24
  • @Stophface, puppeteer is automating the browser, so as long as you get the data on the page to capture, you can do it. If not then you will have to resort to such ways – Tarun Lalwani Mar 10 '19 at 02:52
  • How do you understand where to place your breakpoint? I can't find '.send(' phrase in js files. – ALalavi Nov 10 '20 at 22:38
36
// for old puppeteer
// const client = page._client
const client = await page.target().createCDPSession()

await client.send('Network.enable')
    
client.on('Network.webSocketCreated', ({requestId, url}) => {
  console.log('Network.webSocketCreated', requestId, url)
})
    
client.on('Network.webSocketClosed', ({requestId, timestamp}) => {
  console.log('Network.webSocketClosed', requestId, timestamp)
})
    
client.on('Network.webSocketFrameSent', ({requestId, timestamp, response}) => {
  console.log('Network.webSocketFrameSent', requestId, timestamp, response.payloadData)
})

client.on('Network.webSocketFrameReceived', ({requestId, timestamp, response}) => {
  console.log('Network.webSocketFrameReceived', requestId, timestamp, response.payloadData)
})

It is by using DevTools protocol directly - https://chromedevtools.github.io/devtools-protocol/tot/Network#event-webSocketClosed

zag2art
  • 4,869
  • 1
  • 29
  • 39
  • how to respond with mock data though? – Tibebes. M Jan 09 '21 at 07:59
  • 2
    Looks like the only way for now is to intercept the websocket connection request and redirect it to your own websocket server. Maybe this can be helpful https://forum.katalon.com/t/intercepting-request-with-chrome-devtools-protocol/36081 – zag2art Mar 15 '21 at 07:28
  • @zag2art, Sorry to bring up a 2 year old question, but please explain how to call Network.webSocketFrameReceived ? Should I specify all three parameters when calling this function? Or just two RequestId and MonotonicTime ? – Optimus1 Apr 08 '22 at 17:39
  • @Optimus1, just copy the whole example and take a look. these three are not parameters you need to specify. they are callback parameters, so you will get them. – zag2art Apr 10 '22 at 17:39
  • I am afraid this does not work anylonger in Puppeteer version 15 (maybe even earlier): `TypeError: client.on is not a function`. Any solutions? My proposal: change `page._client` to `page._client()`. – rriemann Jul 01 '22 at 09:56
  • probably const client = await page.target().createCDPSession() – zag2art Jul 06 '22 at 14:50
  • This worked for me only after I added `await client.send('Network.enable');` – Anthony Ter Apr 29 '23 at 04:42
6

I don't think puppeteer has support for this yet, but the lower-level protocol does here: https://chromedevtools.github.io/devtools-protocol/tot/Network/#event-webSocketFrameSent and https://chromedevtools.github.io/devtools-protocol/tot/Network#type-WebSocketResponse. This means that you could implement this yourself in a library if you wanted too.

browserless
  • 2,090
  • 16
  • 16