3

I'm trying to automate an email sending service, which sends a person's bus station to his mail.

In order to do so I need to pull some data from a Hebrew website, but all I get is a file with gibberish in it.

I have tried encoding to utf8, but all I get is more gibberish.

import requests
import pandas as pd

url = 'http://yit.maya-tour.co.il/yit-pass/Drop_Report.aspx?client_code=2660&coordinator_code=2669'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)
df.to_csv('my data.csv')

I expected for the following:

רשימת פיזורים

שם הנהג סוג הרכב הערות תאור שעה

מוניות הקניון מונית A35 פיזור-שדרות 06:30

but got:

               ×©× ×× ×× ×¡×× ×ר××  ...               ת××ר שע×
0  ××× ××ת ××§× ×××      ××× ×ת  ...  פ×××ר-ש×ר×ת  06:30
Community
  • 1
  • 1
matanslook
  • 53
  • 4

1 Answers1

2

A response object's .content property gives you the data in bytes, try doing .text instead:

html = requests.get(url).text

More detail here: What is the difference between 'content' and 'text'

Alex
  • 2,270
  • 3
  • 33
  • 65