due to its cookies

Asked Aug 22 '23 at 08:30

Active Aug 22 '23 at 08:39

Viewed 22 times

-2

I'm not able to crawl https://www.chictr.org.cn/ due to its cookies. This is an chinese website. We have captcha which is fairly simple actually only need to slide but it detects if you are bot.

On every new requests they have TraceID.

If we don't have right cookies it will give me infinite captcha. selenium and puppeteer will not work.

import requests
import secrets
import re

cookies = {
   ** 'acw_sc__v3': '64e468537406656dd8a684aa0d2c1c278125a95b'**,
}



for x in range(2,3):
    params = {
        'page': x
    }

    response = requests.get('https://www.chictr.org.cn/searchprojEN.html', params=params, cookies=cookies)
    
    print(re.search('regno=(.*?)"',response.text,flags=re.S).group(1))

We have argument acw_sc in cookies actually there are 3 versons viz acw_sc__v1, acw_sc__v2, acw_sc__v3. we actually need acw_sc__v3.

edited Aug 22 '23 at 08:39

Filburt

17,626
12
64
115

asked Aug 22 '23 at 08:30

Shrehans Rai

1

If their site blocks crawlers, why not respect that? – Nico Haase Aug 22 '23 at 08:34
because data is free. and it is given by chiniese and UN body. – Shrehans Rai Aug 22 '23 at 09:41
If that data is "free", there should be a proper API – Nico Haase Aug 22 '23 at 09:52

i m not able to crawl https://www.chictr.org.cn/ due to its cookies

0 Answers0