The home page/BASE_URL itself contains information about the total number of pages, first, scrape the maximum page number and iterate over it to get the data from all the available pages.
Here's the implementation:
import json
import requests
from bs4 import BeautifulSoup
import warnings
warnings.filterwarnings("ignore")
BASE_URL = 'https://hd8.4lordserials.xyz/anime-serialy'
session = requests.Session()
session.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0'
items = []
def scrape_page(url):
rs = session.get(url, verify=False)
rs.raise_for_status()
soup = BeautifulSoup(rs.content, 'html.parser')
for item in soup.select('.th-item'):
title = item.select_one('.th-title').text
url = item.a['href']
items.append({
'title': title,
'url': url,
})
def scrape_all_pages(base_url):
response = session.get(base_url, verify=False).text
soup = BeautifulSoup(response, 'html.parser')
max_page = int(soup.select('div.navigation>a')[-1].text)
print(f"maximum pages: {max_page}")
for page in range(1, max_page + 1):
page_url = f'{base_url}/page/{page}/'
print(f"page url: {page_url}")
scrape_page(page_url)
scrape_all_pages(BASE_URL)
print(f"total items: {len(items)}")
with open('out.json', 'w', encoding='utf-8') as f:
json.dump(items, f, indent=4, ensure_ascii=False)
output:
maximum pages: 21
page url: https://hd8.4lordserials.xyz/anime-serialy/page/1/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/2/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/3/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/4/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/5/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/6/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/7/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/8/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/9/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/10/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/11/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/12/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/13/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/14/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/15/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/16/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/17/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/18/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/19/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/20/
page url: https://hd8.4lordserials.xyz/anime-serialy/page/21/
total items: 497
The file out.json
:
[
{
"title": "Принесённая в жертву Принцесса и Царь зверей",
"url": "https://hd8.4lordserials.xyz/13694-prinesyonnaya-v-zhertvu-princessa-i-car-zverei.html"
},
{
"title": "Магическая битва",
"url": "https://hd8.4lordserials.xyz/6707-magicheskaya-bitva.html"
},
{
"title": "В бегах: Великая миссия",
"url": "https://hd8.4lordserials.xyz/13702-v-begah-velikaya-missiya.html"
},
{
"title": "Маги: Волшебный лабиринт",
"url": "https://hd8.4lordserials.xyz/13796-magi-volshebnyi-labirint.html"
},
{
"title": "Бессонница после школы",
"url": "https://hd8.4lordserials.xyz/13587-bessonnica-posle-shkoly.html"
},
{
"title": "История о мононокэ",
"url": "https://hd8.4lordserials.xyz/8370-istoriya-o-mononoke.html"
},
{
"title": "Йохане из паргелия: Солнечный свет в зеркале",
"url": "https://hd8.4lordserials.xyz/13826-iohane-iz-pargeliya-solnechnyi-svet-v-zerkale.html"
},
{
"title": "Боевой континент 2: Непревзойдённый клан Та",
"url": "https://hd8.4lordserials.xyz/13824-boevoi-kontinent-2-neprevzoidyonnyi-klan-ta.html"
},
{
"title": "Синий оркестр",
"url": "https://hd8.4lordserials.xyz/13643-sinii-orkestr.html"
},
{
"title": "Нулевой Эдем",
"url": "https://hd8.4lordserials.xyz/6597-nulevoi-edem.html"
},
{
"title": "Мобильный воин Гандам: Ведьма с Меркурия",
"url": "https://hd8.4lordserials.xyz/7154-mobilnyi-voin-gandam-vedma-s-merkuriya.html"
},
{
"title": "Адский рай",
"url": "https://hd8.4lordserials.xyz/13561-adskii-rai.html"
},
{
"title": "Неповторимый еженедельник боевых искусств",
"url": "https://hd8.4lordserials.xyz/13825-nepovtorimyi-ezhenedelnik-boevyh-iskusstv.html"
},
{
"title": "Магия и мускулы",
"url": "https://hd8.4lordserials.xyz/13768-magiya-i-muskuly.html"
},
{
"title": "Единорог: Вечные воины",
"url": "https://hd8.4lordserials.xyz/13689-edinorog-vechnye-voiny-w1.html"
},
{
"title": "Причина полюбить её",
"url": "https://hd8.4lordserials.xyz/13642-prichina-polyubit-eyo-w5.html"
},
{
"title": "Я получил читерские способности в другом мире и стал экстраординарным в реальном мире: История о том, как повышение уровня изменило мою жизнь",
"url": "https://hd8.4lordserials.xyz/13697-ya-poluchil-chiterskie-sposobnosti-v-drugom-mire-i-stal-ekstraordinarnym-v-realnom-mire-istoriya-o-tom-kak-povyshenie-urovnya-izmenilo-moyu-zhizn.html"
},
.
.
.
.
{
"title": "Рейтинг короля",
"url": "https://hd8.4lordserials.xyz/6604-reiting-korolya.html"
},
{
"title": "Усио и Тора",
"url": "https://hd8.4lordserials.xyz/6603-usio-i-tora.html"
},
{
"title": "Эхо террора",
"url": "https://hd8.4lordserials.xyz/6602-eho-terrora.html"
},
{
"title": "Невеста чародея: В ожидании путеводной звезды",
"url": "https://hd8.4lordserials.xyz/6600-nevesta-charodeya-v-ozhidanii-putevodnoi-zvezdy.html"
},
{
"title": "Юру Юри",
"url": "https://hd8.4lordserials.xyz/6601-yuru-yuri.html"
},
{
"title": "Ди. Грэй-мен: Святые",
"url": "https://hd8.4lordserials.xyz/6598-di-grei-men-svyatye-w1.html"
},
{
"title": "Принцессы-полудемоны",
"url": "https://hd8.4lordserials.xyz/6599-princessy-poludemony.html"
},
{
"title": "Мыши-рокеры с Марса",
"url": "https://hd8.4lordserials.xyz/6596-myshi-rokery-s-marsa.html"
},
{
"title": "Суперзлодеи",
"url": "https://hd8.4lordserials.xyz/6595-superzlodei.html"
},
{
"title": "Связанные небом",
"url": "https://hd8.4lordserials.xyz/6593-svyazannye-nebom.html"
},
{
"title": "Ди.Грэй-мен",
"url": "https://hd8.4lordserials.xyz/6594-digrei-men.html"
},
{
"title": "Механическая планета",
"url": "https://hd8.4lordserials.xyz/6592-mehanicheskaya-planeta.html"
},
{
"title": "WIXOSS: Заражённый селектор",
"url": "https://hd8.4lordserials.xyz/6591-wixoss-zarazhyonnyi-selektor.html"
},
{
"title": "Платиновый предел",
"url": "https://hd8.4lordserials.xyz/6590-platinovyi-predel.html"
},
{
"title": "Манкацу",
"url": "https://hd8.4lordserials.xyz/6589-mankacu.html"
},
{
"title": "Школа мертвецов",
"url": "https://hd8.4lordserials.xyz/6588-shkola-mertvecov.html"
},
{
"title": "Розарио + Вампир",
"url": "https://hd8.4lordserials.xyz/6586-rozario-vampir.html"
},
{
"title": "Паладин издалека",
"url": "https://hd8.4lordserials.xyz/6587-paladin-izdaleka.html"
},
{
"title": "Атака титанов: Потерянные девушки",
"url": "https://hd8.4lordserials.xyz/6585-ataka-titanov-poteryannye-devushki-w3.html"
},
{
"title": "Битвы маленьких гигантов",
"url": "https://hd8.4lordserials.xyz/6584-bitvy-malenkih-gigantov.html"
},
{
"title": "В другом мире с мужчиной, обратившимся красоткой",
"url": "https://hd8.4lordserials.xyz/6583-v-drugom-mire-s-muzhchinoi-obrativshimsya-krasotkoi.html"
},
{
"title": "Девушки на линии фронта",
"url": "https://hd8.4lordserials.xyz/6582-devushki-na-linii-fronta.html"
},
{
"title": "Истории Коёми",
"url": "https://hd8.4lordserials.xyz/6581-istorii-koyomi.html"
},
{
"title": "История цветов",
"url": "https://hd8.4lordserials.xyz/6580-istoriya-cvetov.html"
},
{
"title": "Сильнейший мудрец со слабейшей меткой",
"url": "https://hd8.4lordserials.xyz/6578-silneishii-mudrec-so-slabeishei-metkoi.html"
},
{
"title": "Ярость Бахамута: Генезис",
"url": "https://hd8.4lordserials.xyz/6576-yarost-bahamuta-genezis.html"
},
{
"title": "Саюки: Перезарядка — Зероин",
"url": "https://hd8.4lordserials.xyz/6574-sayuki-perezaryadka-zeroin.html"
}
]
I hope it solves your problem.