1

I'm trying to replace a string within another string with command content1.string.replace(str(remove1), ''). The complete code is

from bs4 import BeautifulSoup
import urllib3
http = urllib3.PoolManager()

url = 'https://www.collinsdictionary.com/dictionary/french-english/aimer'
r1 = http.request('get', url)
r2 = BeautifulSoup(r1.data, 'html.parser')

entry_name1 = r2.find('span', {'class' : 'orth'})
print(type(entry_name1))

entry_name2 = entry_name1.string.replace('<span class="orth">', '').replace('</span>', '')
print(type(entry_name2))

content1 = r2.find('div', {'class' : 'res_cell_center'})
print(type(content1))

remove1 = content1.find('div', {'class' : 'cB cB-hook'})
print(type(str(remove1)))

content2 = content1.string.replace(str(remove1), '')

The result is

<class 'bs4.element.Tag'>
<class 'str'>
<class 'bs4.element.Tag'>
<class 'str'>
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-58c961c32cdb> in <module>
     19 print(type(str(remove1)))
     20 
---> 21 content2 = content1.string.replace(str(remove1), '')

AttributeError: 'NoneType' object has no attribute 'replace'

The types of objects in two commands entry_name1.string.replace('<span class="orth">', '').replace('</span>', '') and content1.string.replace(str(remove1), '') are the same.

Could you please elaborate how the latter induces the error?


Update: As requested by @Andrej Kesely, I try to crawl the main content of that url. First, I crawl the content marked by div class = "res_cell_center", from which I subsequently remove the content marked by div class = "cB cB-hook".

Akira
  • 2,594
  • 3
  • 20
  • 45
  • As _remove1_ seems to be `None` you might want to check what `content1.find('div', {'class' : 'cB cB-hook'})` returns. – Gregor Jul 26 '20 at 21:59
  • Can you explain what are you trying to do? What information are you trying to get from the page? – Andrej Kesely Jul 26 '20 at 22:00
  • @Gregor I've just checked and it is not an empty string. Actually, `remove1` is a long string. – Akira Jul 26 '20 at 22:05
  • @AndrejKesely Please see my edit. – Akira Jul 26 '20 at 22:09
  • this might help: https://stackoverflow.com/questions/25327693/difference-between-string-and-text-beautifulsoup. Can you use `.text` instead of `.string`? This should work. – Gregor Jul 26 '20 at 22:10

1 Answers1

0

To get content of the page, you can use .get_text() method, no need to replace sections of soup with empty string:

import requests
from bs4 import BeautifulSoup

url = 'https://www.collinsdictionary.com/dictionary/french-english/aimer'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

print(soup.h2.text)
print(soup.select_one('.hom').get_text(strip=True, separator=' '))

Prints:

aimer
Full verb table transitive verb 1. ( d’amour ) to love Elle aime ses enfants. She loves her children. 2. ( d’amitié, par affection ) to like bien aimer qn to like sb J’aime bien Paul, on peut vraiment compter sur lui. I really like Paul, he’s so reliable. Je n’aime pas beaucoup Marie. I don’t like Marie very much. 3. ( par goût ) [ aliment, divertissement, auteur ] to like Tu aimes le chocolat ? Do you like chocolate? bien aimer qch to like sth J’aime bien jouer au tennis. I like playing tennis. aimer faire qch to like doing sth J’aime assez aller au cinéma. I quite like going to the cinema. 4. ( préférence ) aimer mieux qn que qn to prefer sb to sb aimer mieux qch que qch to prefer sth to sth J’aime mieux Paul. I prefer Paul. J’aime mieux Paul que Pierre. I prefer Paul to Pierre. Il aime mieux faire la cuisine qu’aller au restaurant. He’d rather cook than go to a restaurant. j’aime mieux vous dire que , j’aime autant vous dire que I may as well tell you that 5. ( conditionnel : souhait ) j’aimerais ... I would like ... J’aimerais aller en Écosse. I’d like to go to Scotland. Aimeriez-vous que je vous accompagne ? Would you like me to come with you? j’aimerais bien ... I would like ... J’aimerais bien m’en aller. I’d like to go. j’aimerais mieux faire ... I’d rather do ... J’aimerais mieux ne pas y aller. I’d rather not go. J’aimerais mieux y aller maintenant. I’d rather go now. J’aimerais autant y aller maintenant. I’d rather go now.

EDIT: To get HTML markup, you can do this:

import requests
from bs4 import BeautifulSoup

url = 'https://www.collinsdictionary.com/dictionary/french-english/aimer'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

for script in soup.select('script, .hcdcrt, #ad_contentslot_1, #ad_contentslot_2'):
    script.extract()

print(soup.h2.text)
print(''.join(map(str, soup.select_one('.hom').contents)))

Prints:

aimer
<span class="gramGrp"><span class="xr"> <a class="link-right verbtable" href="https://www.collinsdictionary.com/dictionary/french-english/conjugation/aimer">Full verb table</a>
</span><span class="hi rend-sc pos">transitive verb</span></span><div class="sense"><span class="sensenum bluebold">1. </span> <span class="lbl type-misc"><span class="punctuation">(</span>d’amour<span class="punctuation">)</span></span> <span class="cit type-translation quote">to love</span><div class="cit type-example"><span class="quote">Elle aime ses enfants.</span> <span class="cit type-translation quote">She loves her children.</span></div></div><div class="sense"><span class="sensenum bluebold">2. </span> <span class="lbl type-misc"><span class="punctuation">(</span>d’amitié, par affection<span class="punctuation">)</span></span> <span class="cit type-translation quote">to like</span><div class="re type-phr"><span class="form type-phr orth">bien aimer qn</span> <span class="cit type-translation quote">to like sb</span><div class="cit type-example"><span class="quote">J’aime bien Paul, on peut vraiment compter sur lui.</span> <span class="cit type-translation quote">I really like Paul, he’s so reliable.</span></div><div class="cit type-example"><span class="quote">Je n’aime pas beaucoup Marie.</span> <span class="cit type-translation quote">I don’t like Marie very much.</span></div></div> </div><div class="sense"><span class="sensenum bluebold">3. </span> <span class="lbl type-misc"><span class="punctuation">(</span>par goût<span class="gramGrp colloc"><span class="punctuation">) </span><span class="punctuation">[</span>aliment, divertissement, auteur<span class="punctuation">]</span></span></span> <span class="cit type-translation quote">to like</span><div class="cit type-example"><span class="quote">Tu aimes le chocolat ?</span> <span class="cit type-translation quote">Do you like chocolate?</span></div><div class="re type-phr"><span class="form type-phr orth">bien aimer qch</span> <span class="cit type-translation quote">to like sth</span><div class="cit type-example"><span class="quote">J’aime bien jouer au tennis.</span> <span class="cit type-translation quote">I like playing tennis.</span></div></div><div class="re type-phr"><span class="form type-phr orth">aimer faire qch</span> <span class="cit type-translation quote">to like doing sth</span><div class="cit type-example"><span class="quote">J’aime assez aller au cinéma.</span> <span class="cit type-translation quote">I quite like going to the cinema.</span></div></div> <div class="mpuslot_b-container"> 


</div> </div><div class="sense"><span class="sensenum bluebold">4. </span> <span class="lbl type-misc"><span class="punctuation">(</span>préférence<span class="punctuation">)</span></span><div class="re type-phr"><span class="form type-phr orth">aimer mieux qn que qn</span> <span class="cit type-translation quote">to prefer sb to sb</span></div><div class="re type-phr"><span class="form type-phr orth">aimer mieux qch que qch</span> <span class="cit type-translation quote">to prefer sth to sth</span><div class="cit type-example"><span class="quote">J’aime mieux Paul.</span> <span class="cit type-translation quote">I prefer Paul.</span></div><div class="cit type-example"><span class="quote">J’aime mieux Paul que Pierre.</span> <span class="cit type-translation quote">I prefer Paul to Pierre.</span></div><div class="cit type-example"><span class="quote">Il aime mieux faire la cuisine qu’aller au restaurant.</span> <span class="cit type-translation quote">He’d rather cook than go to a restaurant.</span></div></div><div class="re type-phr"><span class="form type-phr orth">j’aime mieux vous dire que</span><span class="form type-phr"><span class="punctuation">, </span><span class="orth">j’aime autant vous dire que</span></span> <span class="cit type-translation quote">I may as well tell you that</span></div> </div><div class="sense"><span class="sensenum bluebold">5. </span> <span class="lbl type-misc"><span class="punctuation">(</span>conditionnel : souhait<span class="punctuation">)</span></span><div class="re type-phr"><span class="form type-phr orth">j’aimerais ...</span> <span class="cit type-translation quote">I would like ...</span><div class="cit type-example"><span class="quote">J’aimerais aller en Écosse.</span> <span class="cit type-translation quote">I’d like to go to Scotland.</span></div><div class="cit type-example"><span class="quote">Aimeriez-vous que je vous accompagne ?</span> <span class="cit type-translation quote">Would you like me to come with you?</span></div></div><div class="re type-phr"><span class="form type-phr orth">j’aimerais bien ...</span> <span class="cit type-translation quote">I would like ...</span><div class="cit type-example"><span class="quote">J’aimerais bien m’en aller.</span> <span class="cit type-translation quote">I’d like to go.</span></div></div><div class="re type-phr"><span class="form type-phr orth">j’aimerais mieux faire ...</span> <span class="cit type-translation quote">I’d rather do ...</span><div class="cit type-example"><span class="quote">J’aimerais mieux ne pas y aller.</span> <span class="cit type-translation quote">I’d rather not go.</span></div><div class="cit type-example"><span class="quote">J’aimerais mieux y aller maintenant.</span> <span class="cit type-translation quote">I’d rather go now.</span></div><div class="cit type-example"><span class="quote">J’aimerais autant y aller maintenant.</span> <span class="cit type-translation quote">I’d rather go now.</span></div></div> <div class="mpuslot_b-container"> 


</div> </div>
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • I'm sorry for not being clear enough. Your answer perfectly solving the issue of extracting the *content*. For my purpose, I need to retain the html mark-up so that I can render the content later. – Akira Jul 26 '20 at 22:26
  • The update is great. Could you please edit your code to also extract `Verb conjugations for aimer` and `Examples of 'aimer' in a sentence`? – Akira Jul 26 '20 at 23:06
  • Could you please explain what is the purposes of `for script in soup.select('script, .hcdcrt, #ad_contentslot_1, #ad_contentslot_2'): script.extract()` and `''.join(map(str, soup.select_one('.hom').contents))` so that I can google to understand your code? – Akira Jul 26 '20 at 23:19
  • Your code do a marvelous job of cleaning unnecessary things. – Akira Jul 26 '20 at 23:20
  • If you don't mind, please help me answer [this question](https://stackoverflow.com/questions/63109765/how-to-determine-these-elements-of-html). – Akira Jul 27 '20 at 06:49