0

I want to scrapping shopee with node-js

ex)https://shopee.co.th/Erb-Eastern-treat-Body-Oil-240-ml.-%E0%B8%AD%E0%B8%AD%E0%B8%A2%E0%B8%A5%E0%B9%8C%E0%B8%97%E0%B8%B2%E0%B8%9C%E0%B8%B4%E0%B8%A7-%E0%B8%81%E0%B8%A5%E0%B8%B4%E0%B9%88%E0%B8%99%E0%B8%A1%E0%B8%B0%E0%B8%A5%E0%B8%B4%E0%B8%88%E0%B8%B1%E0%B8%AA%E0%B8%A1%E0%B8%B4%E0%B8%99%E0%B8%A1%E0%B8%B4%E0%B9%89%E0%B8%99%E0%B8%97%E0%B9%8C-Relax%E0%B9%81%E0%B8%A5%E0%B8%B0%E0%B8%9B%E0%B8%A5%E0%B8%AD%E0%B8%9A%E0%B8%9B%E0%B8%B0%E0%B9%82%E0%B8%A5%E0%B8%A1%E0%B8%9C%E0%B8%B4%E0%B8%A7%E0%B9%80%E0%B8%AA%E0%B8%B5%E0%B8%A2-%E0%B9%80%E0%B8%95%E0%B8%B4%E0%B8%A1%E0%B8%84%E0%B8%A7%E0%B8%B2%E0%B8%A1%E0%B8%8A%E0%B8%B8%E0%B9%88%E0%B8%A1%E0%B8%8A%E0%B8%B7%E0%B9%89%E0%B8%99-%E0%B8%8B%E0%B8%B6%E0%B8%A1%E0%B9%84%E0%B8%A7-%E0%B9%80%E0%B8%AD%E0%B8%B4%E0%B8%9A-i.84822794.1420642134

In that page, I found 'https://shopee.co.th/api/v4/item/get?itemid=1420642134&shopid=84822794' to give me all the information I wanted.

and I tried to reduce header, and below is the result.

curl 'https://shopee.co.th/api/v4/item/get?itemid=1420642134&shopid=84822794' \
  -H 'af-ac-enc-dat: AAcyLjcuMS0yAAABhwLjUjEAAAxaAnAAAAAAAAAAAhqvdkk2/tP8OrUXnaHOA3/k0kq4eVtks1BST6eLSS4Vm6Mzx0gD9O1aYGgjmjuYbekv+r+GWty1WPBG9Kro2KSX3nUa9+IpVUwRoi4gKhKMNqyn5szoUhKAg6vJXYPkXE5ho+pc0XG8frvsLhQK78fs7www8WJ6JV771K7M3S4Ty2Ncm8vFui5C+Cokhc47s9IEFIsGDUtpEpNSSI0oOt3tagOTHkenEhnNI10zaevKlBjvvsvFui5C+Cokhc47s9IEFIsGDUtpEpNSSI0oOt3tagOTUWJy02xIVh23BkGRTuyWQfap+xOCy0qf6FceYCNc1JRGrbfJWuRYlFp+J0tcUk6vRQg7nCQX6c8aLMHSU1udIlj4f3pW05can3G4luWGcOFGrbfJWuRYlFp+J0tcUk6vgCDxiBpIlUwbzBjpEgESiJf/NhMnw8f21UXD2mkb2JtI4KZeS5XG6Vg+Sy6faGwv0CNfz1xrNOs4FGq3WJKpGQcb2+nJz69gWGNMbg621M/uFFgR46ErQ/2L6Gs4QB0cokRJMASgBP5xFgtls592gxZGy/dvZo1WISPUzsZZQ7oyXw4MgaypduOZgzYi8ExEKPdqfXm2ERcslapCFVPJr5zi9kuOGNhwFVdgRQ0UFr3RHJJ2HKErNOh5euwKv/tW581pD60drdkNvNef4VqxkBf0XvK44D0BGiOcMV4KIKk+V7f48kwlCpS/9Zn1v+vyw74kW69xNHreBu7kIyz+ndGNJzn0cR0KRBiB9HBK+fKX/zYTJ8PH9tVFw9ppG9ibl/82EyfDx/bVRcPaaRvYm6/+8/hXle9RT/jsf/z8TjQ=' \
  -H 'cookie: __LOCALE__null=TH; csrftoken=VAyoXJ79ttVloHqgLxxKWQqEx4ibqByd; _gcl_au=1.1.1060540446.1679214255; _med=refer; SPC_SI=gzMQZAAAAABpTExJYnZnb+a4VgAAAAAAWmdHc1VjVEg=; SPC_F=EGfFY2araCb12cnF0CZQrBcpGlb3qu0R; REC_T_ID=6f2afd0e-c62f-11ed-9f6e-b47af14a4d88; SPC_R_T_ID=Hw+c+5w8Ks+n3gsrQPdgSvyxlHVgSyMxXHtmpKL4XskhpkD4G7hIjckxYJhBwza3wEPIEvXTMlv8wrykdAEJ+8u3FkIWhm5shk5ZXb3O30PVtZc7hyAYK+8zKVaOr9ObArahuFoamrOUlWEEkXAO2KE/mncYRYhx4RKsN+/DaVY=; SPC_R_T_IV=NUZ2MVJOZ3hXVGZuMXZ0VQ==; SPC_T_ID=Hw+c+5w8Ks+n3gsrQPdgSvyxlHVgSyMxXHtmpKL4XskhpkD4G7hIjckxYJhBwza3wEPIEvXTMlv8wrykdAEJ+8u3FkIWhm5shk5ZXb3O30PVtZc7hyAYK+8zKVaOr9ObArahuFoamrOUlWEEkXAO2KE/mncYRYhx4RKsN+/DaVY=; SPC_T_IV=NUZ2MVJOZ3hXVGZuMXZ0VQ==; _QPWSDCXHZQA=e18d16ae-f265-4008-bf99-a6129bbdb7a3; _gid=GA1.3.1588758656.1679214255; language=en; AMP_TOKEN=%24NOT_FOUND; _ga_L4QXS6R7YG=GS1.1.1679380382.6.1.1679380665.59.0.0; _ga=GA1.1.812985200.1679214255; _dc_gtm_UA-61914165-6=1; shopee_webUnique_ccd=A5%2FaHWKtD7fQgBzzgAu%2Big%3D%3D%7CB12tGuvi%2FEmGDn8xyX7RZjqzPYrXJvetVKHxRtWiFzZLOS%2B3l%2FPbAmh%2BWY2v8foPSy8FVAYMTrSZwzKv7y5R8eonJ%2FHmi1p1GZw%3D%7Cg4aeTyLcipb7GsGB%7C06%7C3; ds=d2f693fcde2f5f684f75295e07e66e77' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36' \
  --compressed

If I use that command I can get json data start with {"error":null,"error_msg":null,"data":{"itemid":1420642134,"shopid":84822794,"userid":0,"price_max_before_discount":143000000,...

but when I convert to node-fetch, it doesn't work.

import fetch from 'node-fetch';

main()

async function main(): Promise<void> {
const res = await fetch('https://shopee.co.th/api/v4/item/get?itemid=1420642134&shopid=84822794', {
  headers: {
    'af-ac-enc-dat': 'AAcyLjcuMS0yAAABhwLjUjEAAAxaAnAAAAAAAAAAAhqvdkk2/tP8OrUXnaHOA3/k0kq4eVtks1BST6eLSS4Vm6Mzx0gD9O1aYGgjmjuYbekv+r+GWty1WPBG9Kro2KSX3nUa9+IpVUwRoi4gKhKMNqyn5szoUhKAg6vJXYPkXE5ho+pc0XG8frvsLhQK78fs7www8WJ6JV771K7M3S4Ty2Ncm8vFui5C+Cokhc47s9IEFIsGDUtpEpNSSI0oOt3tagOTHkenEhnNI10zaevKlBjvvsvFui5C+Cokhc47s9IEFIsGDUtpEpNSSI0oOt3tagOTUWJy02xIVh23BkGRTuyWQfap+xOCy0qf6FceYCNc1JRGrbfJWuRYlFp+J0tcUk6vRQg7nCQX6c8aLMHSU1udIlj4f3pW05can3G4luWGcOFGrbfJWuRYlFp+J0tcUk6vgCDxiBpIlUwbzBjpEgESiJf/NhMnw8f21UXD2mkb2JtI4KZeS5XG6Vg+Sy6faGwv0CNfz1xrNOs4FGq3WJKpGQcb2+nJz69gWGNMbg621M/uFFgR46ErQ/2L6Gs4QB0cokRJMASgBP5xFgtls592gxZGy/dvZo1WISPUzsZZQ7oyXw4MgaypduOZgzYi8ExEKPdqfXm2ERcslapCFVPJr5zi9kuOGNhwFVdgRQ0UFr3RHJJ2HKErNOh5euwKv/tW581pD60drdkNvNef4VqxkBf0XvK44D0BGiOcMV4KIKk+V7f48kwlCpS/9Zn1v+vyw74kW69xNHreBu7kIyz+ndGNJzn0cR0KRBiB9HBK+fKX/zYTJ8PH9tVFw9ppG9ibl/82EyfDx/bVRcPaaRvYm6/+8/hXle9RT/jsf/z8TjQ=',
    'cookie': '__LOCALE__null=TH; csrftoken=VAyoXJ79ttVloHqgLxxKWQqEx4ibqByd; _gcl_au=1.1.1060540446.1679214255; _med=refer; SPC_SI=gzMQZAAAAABpTExJYnZnb+a4VgAAAAAAWmdHc1VjVEg=; SPC_F=EGfFY2araCb12cnF0CZQrBcpGlb3qu0R; REC_T_ID=6f2afd0e-c62f-11ed-9f6e-b47af14a4d88; SPC_R_T_ID=Hw+c+5w8Ks+n3gsrQPdgSvyxlHVgSyMxXHtmpKL4XskhpkD4G7hIjckxYJhBwza3wEPIEvXTMlv8wrykdAEJ+8u3FkIWhm5shk5ZXb3O30PVtZc7hyAYK+8zKVaOr9ObArahuFoamrOUlWEEkXAO2KE/mncYRYhx4RKsN+/DaVY=; SPC_R_T_IV=NUZ2MVJOZ3hXVGZuMXZ0VQ==; SPC_T_ID=Hw+c+5w8Ks+n3gsrQPdgSvyxlHVgSyMxXHtmpKL4XskhpkD4G7hIjckxYJhBwza3wEPIEvXTMlv8wrykdAEJ+8u3FkIWhm5shk5ZXb3O30PVtZc7hyAYK+8zKVaOr9ObArahuFoamrOUlWEEkXAO2KE/mncYRYhx4RKsN+/DaVY=; SPC_T_IV=NUZ2MVJOZ3hXVGZuMXZ0VQ==; _QPWSDCXHZQA=e18d16ae-f265-4008-bf99-a6129bbdb7a3; _gid=GA1.3.1588758656.1679214255; language=en; AMP_TOKEN=%24NOT_FOUND; _ga_L4QXS6R7YG=GS1.1.1679380382.6.1.1679380665.59.0.0; _ga=GA1.1.812985200.1679214255; _dc_gtm_UA-61914165-6=1; shopee_webUnique_ccd=A5%2FaHWKtD7fQgBzzgAu%2Big%3D%3D%7CB12tGuvi%2FEmGDn8xyX7RZjqzPYrXJvetVKHxRtWiFzZLOS%2B3l%2FPbAmh%2BWY2v8foPSy8FVAYMTrSZwzKv7y5R8eonJ%2FHmi1p1GZw%3D%7Cg4aeTyLcipb7GsGB%7C06%7C3; ds=d2f693fcde2f5f684f75295e07e66e77',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36'
  }
});
  console.log(await res.text())
}

it returns like

{"tracking_id":"a28dd572-cb0f-43da-832f-00c2f6c509b1","action_type":2,"error":90309999,"is_customized":false,"is_login":false,"platform":0,"report_extra_info":""}

This is not the expected value.

I tried to find which point is different, but I can't. My guess is that Shopee checks for scraping via 'af-ac-enc-dat', which contains information from cookies and user-agents. If any of the three changes, even curl will get the same response as fetch. I'd like to know what I did wrong to make the results different between the two.

I'm using "node-fetch": "^2.6.1", node v19.8.1

I tried to solve the problem and was told that node-fetch does not support http 2. However, curl works fine on http1.1, so this doesn't seem to be the problem.

curl 'https://shopee.co.th/api/v4/item/get?itemid=1420642134&shopid=84822794' --http1.1 ...

Since node-fetch basically sets ua (User-Agent : node-fetch), I tried changing the case, but it didn't affect it.

af-ac-enc-dat, cookie, user-agent

My guess, those three are not set to the values ​​I intended in node-fetch.

Giuk Kim
  • 170
  • 10

1 Answers1

1

I checked my request header.

curl

{
    host: 'localhost:4000',
    accept: '*/*',
    'af-ac-enc-dat': '~~',
    cookie: '~~',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
    'accept-encoding': 'deflate, gzip'
  }

fetch

    host: 'localhost:4000',
    accept: '*/*',
    'af-ac-enc-dat': '~~',
    cookie: '~~',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
    connection: 'close',
    'accept-language': '*',
    'sec-fetch-mode': 'cors',
    'accept-encoding': 'gzip, deflate'
  }

fetch adds ...

    connection: 'close',
    'accept-language': '*',
    'sec-fetch-mode': 'cors',

https://www.npmjs.com/package/node-fetch#default-headers

and http2 will reject request with "connection"

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Connection

But I couldn't remove the connection.

The bottom line is that one works and the other does not because the two are not perfectly equal.

Giuk Kim
  • 170
  • 10
  • doesn't seem to work when I pass those headers via python scrapy or even just the simple requests – Ice Bear Aug 19 '23 at 13:20
  • Hi! how did you managed it to work I'm still getting ```{"is_customized":false,"is_login":true,"platform":0,"action_type":2,"error":90309999,"tracking_id":"dd4fb169-2404-4a4d-a966-cae8e5b27318","report_extra_info":""} ``` – Ice Bear Aug 19 '23 at 15:57
  • @IceBear I only found the cause of the different return, and I failed scraping with node-fetch. And shopee changed the security token generation method in early July, so now I can't even scrape with curl. – Giuk Kim Aug 21 '23 at 01:39
  • such a bummer tbh, I actually made it worked before. I think something has changed. I raised a question [here](https://stackoverflow.com/q/76936341/14425271) – Ice Bear Aug 21 '23 at 03:47