2

I am trying to crawl a web site with got library.

I write a simple code.

import got from 'got';

async function test(){
  const data = await got('https://dhlottery.co.kr/store.do?method=topStore&pageGubun=L645', { encoding: 'utf8'});
  console.log(data.body);

}
test();

it works, but it doesn't show Korean words properly.

some part of the output is here.

<div class="foot_txt2">
  <p>Copyright (c) 2018 ��������ȸ&amp;���ູ��. All rights reserved</p>
  <p>�� Ȩ�������� �Խõ� �̸��� �ּҰ� �ڵ� �����Ǵ� ���� �ź��ϸ�, �̸� ���ݽ� ������Ÿ����� ���� ó������ �����Ͽ� �ֽñ� �ٶ��ϴ�.</p>
  <p class="f_blue2">û�ҳ��� ������ �����ϰų� ��÷���� ������ �� �����ϴ�.</p>
</div>

All crashed words are Korean.

I just want to know why this happens, and how can I solve it.

kyun
  • 9,710
  • 9
  • 31
  • 66

1 Answers1

1

I haven't used this package before or tested the below, but hopefully this might solve your issue.

In your example your defining utf8 encoding, however, the website uses EUC-KR encoding...

page encoding

So if you update the encoding property on your request, that might fix the problem.

import got from 'got';

async function test(){
  const url = 'https://dhlottery.co.kr/store.do?method=topStore&pageGubun=L645';

  const data = await got(url, {
    encoding: 'EUC-KR'
  });

  console.log(data.body);

}
test();
Levi Cole
  • 3,561
  • 1
  • 21
  • 36
  • I've found that encoded `euc-kr`, but `got` doesn't support encoding type `EUC-KR`. – kyun Jan 29 '20 at 11:59