-3

Still Python is a difficult language for me..T_T I really need your help.

I'm trying to crawl some website. The website URL has four digits at the end as shown below.

URL → http://www.boanjob.co.kr/work/employ_detail.html?no=**2196**

So I composed the following code.

import pandas as pd
import datetime

df_list = [pd.read_html(f'http://www.boanjob.co.kr/work/employ_detail.html?no={number}')[25] for number in range(2196, 2300)

df = pd.concat(df_list).reset_index(drop=True)

df = df.transpose() #I have to change rows and columns.

df = df.dropna(axis=0, how='all').dropna(axis=1, how='all')
# df.columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']

print(df)

It works well in 2196, 2198, 2199, 2200 and so on.

However, 2197 is a non-existent page,

so it sends an error message and goes back to the main screen.

(For loop ends in 2197.)

Is there a way to skip work on this page

(the page where the error message occurs) and go to the next number 2198?

I'm so confused about Python.

Plz help me once again...T_T

MarkPact
  • 17
  • 2
  • Can you add the traceback error message? – tdelaney Sep 17 '20 at 01:03
  • 3
    You seem to be requesting a custom tutorial on exception handling in Python. Tutorials already exist in abundance. Perhaps you could learn about `try` and `except` by working through a tutorial, try to incorporate it into your code, and ask a question about that code if you run into trouble. – John Coleman Sep 17 '20 at 01:04
  • it's not English.."잘못된 요청입니다." → it means "bad request." – MarkPact Sep 17 '20 at 01:12
  • 1
    I didn't ask for a tutorial, but I'm sorry to ask you this question. However, I did not think that the existing method was applied because the error message was a different language than a common message. If it's possible, I'll look for the tutorial and check it out. Thank you for giving me a clue. – MarkPact Sep 17 '20 at 01:19

2 Answers2

1

Try this:

df_list = []
for number in range(2196, 2300):
    try:
        data = pd.read_html(f'http://www.boanjob.co.kr/work/employ_detail.html?no={number}')[25]
        df_list.append(data)
    except:
        print("An exception occurred")
Eric Hua
  • 976
  • 2
  • 11
  • 29
1

You are using a list comprehension to build df_list. That's great, unless there is an error. If an exception is raised, the list is discarded and its difficult to restart it with the next value. This is a good thing in many cases, but bad for you. Instead, you should use a for loop that lets you recover from errors as you go.

import pandas as pd
import datetime

df_list = []
for number in range(2196, 2300):
    try:
        url = f'http://www.boanjob.co.kr/work/employ_detail.html?no={number}' 
        df_list.append(pd.read_html(url))
    except (OSError, ValueError) as e: # note: there may be others
        print(f"Number {number} failed: {e}")

df = pd.concat(df_list).reset_index(drop=True)
del df_list # may as well get rid of the memory
df = df.transpose() #I have to change rows and columns.
df = df.dropna(axis=0, how='all').dropna(axis=1, how='all')
# df.columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
print(df)
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • Really thank you! but it shows "ValueError : No tables found"...... – MarkPact Sep 17 '20 at 01:30
  • You'll have to increase the number of exceptions as you find out what they are. Unfortunately, it doesn't seem well documented. That's better than catching the raw exception, which will mask any errors on your part that may creap into that for loop over time. – tdelaney Sep 17 '20 at 02:42