How do you use the BeautifulSoup .replace_with()
without having something like sharp brackets being converted to >
thing after a str()
string conversion find-and-replace process?
Python code
from bs4 import BeautifulSoup
with open("../dicttest.txt", "r", encoding="utf-8") as f:
full_text = f.read()
parse_1 = BeautifulSoup(full_text, "html.parser")
for line in parse_1.find_all("grace", "AllExamples"):
match = str(line).replace(";</i> <b>", ";</i><br> <b>")
line.replace_with(match)
print(parse_1)
dicttest.txt
all
<link rel="stylesheet" type="text/css" href="stylesheet.css"><font size="-2">Duden-Oxford Deutsch-Englisch</font><br><grace class="SglMngArticle"><span class="WordHead"><b>all</b></span> <grace class="IPA">/al/</grace> <i>Indefinitpron.</i> <i>u. unbest. Zahlw.</i> </grace><br><br><grace class="NumArticle"><span class="Number">1.</span> <i>attr.</i> (<i>ganz, gesamt...</i>) all; </grace><grace class="AllExamples"><grace class="BoldExamples"><b>in aller Deutlichkeit</b></grace> in all clarity;<br> <grace class="BoldExamples"><b>alle Freude, die sie empfunden hat</b></grace> all the joy she felt;<br> <grace class="BoldExamples"><b>alles Geld, das ich noch habe</b></grace> all the money I have left;<br> <grace class="BoldExamples"><b>aller Eifer nützte ihm nichts</b></grace> all his zeal was to no avail;<br> <grace class="BoldExamples"><b>ich kann diese Leute alle nicht leiden</b></grace> I can't stand any of these people;<br> <grace class="BoldExamples"><b>ich will euch alle nicht mehr sehen</b></grace> I don't want to see any of you again;<br> <grace class="BoldExamples"><b>die Ärzte verdienen alle sehr viel</b></grace> doctors all earn a great deal;<br> <grace class="BoldExamples"><b>alles Geld spendete sie dem Roten Kreuz</b></grace> she donated all her money to the Red Cross;<br> <grace class="BoldExamples"><b>alles Leid der Welt</b></grace> all the suffering in the world;<br> <grace class="BoldExamples"><b>all unser/mein </b><i>usw.</i> <b>...</b> all our/my <i>etc. ...;</i> <b>alles andere/Weitere/Übrige</b></grace> everything else;<br> <grace class="BoldExamples"><b>alles Übrige hat sich nicht geändert</b></grace> nothing else has changed;<br> <grace class="BoldExamples"><b>alles Schöne/Neue/Fremde</b></grace> everything <i>or</i> all that is beautiful/new/strange;<br> <grace class="BoldExamples"><b>alles Gute!</b></grace> all the best!;<br> <grace class="BoldExamples"><b>alle Fenster schließen</b></grace> close all the windows;<br> <grace class="BoldExamples"><b>sie gaben alle Waffen ab</b></grace> they handed in all their weapons;<br> <grace class="BoldExamples"><b>wir/ihr/sie alle</b></grace> all of us/you/them; we/you/they all;<br> <grace class="BoldExamples"><b>das sagen sie alle</b></grace> (<i>ugs.</i>) that's what they all say;<br> <grace class="BoldExamples"><b>alle Beteiligten/Anwesenden</b></grace> all those involved/present;<br> <grace class="BoldExamples"><b>trotz aller Vorbehalte werde ich ...</b></grace> in spite of all my reservations I shall ...;<br> <grace class="BoldExamples"><b>alle beide/alle zehn</b></grace> both of them/all ten of them;<br> <grace class="BoldExamples"><b>alle Männer/Frauen/Kinder</b></grace> all men/women/children;<br> <grace class="BoldExamples"><b>alle Mädchen über zwölf Jahre</b></grace> all girls over twelve;<br> <grace class="BoldExamples"><b>alle Mädchen in der Schule</b></grace> all the girls in the school;<br> <grace class="BoldExamples"><b>alle Bewohner der Stadt</b></grace> all the inhabitants of the town;<br> <grace class="BoldExamples"><b>ohne allen Anlass</b></grace> for no reason [at all]; without any reason [at all];<br> <grace class="BoldExamples"><b>gegen alle Erwartungen</b></grace> contrary to all expectations;<br> <grace class="BoldExamples"><b>alle Jahre wieder</b></grace> every year;<br> <grace class="BoldExamples"><b>alle fünf Minuten/Meter</b></grace> every five minutes/metres;<br> <grace class="BoldExamples"><b>Bücher aller Art</b></grace> books of all kinds; all kinds of books;<br> <grace class="BoldExamples"><b>in aller Eile</b></grace> with all haste;<br> <grace class="BoldExamples"><b>in aller Ruhe</b></grace> in peace and quiet;<br> <grace class="BoldExamples"><b>trotz aller Versuche/Anstrengungen</b></grace> despite all [his/her/their/<i>etc.</i>] attempts/efforts. </grace><br><br><grace class="NumArticle"><span class="Number">2.</span> <i>allein stehend</i> </grace><br><br><grace class="LetterArticle"><span class="Letter">a) </span>(<i>gesamt..., sämtlich</i>) everything; </grace><grace class="AllExamples"><grace class="BoldExamples"><b>alles geht vorüber</b></grace> everything passes [in time];<br> <grace class="BoldExamples"><b>alles für die Braut/den Bastler</b></grace> everything for the bride/handicraft enthusiast;<br> <grace class="BoldExamples"><b>das alles</b></grace> all that;<br> <grace class="BoldExamples"><b>ich weiß nicht, was das alles soll</b></grace> I don't know what all that is supposed to mean;<br> <grace class="BoldExamples"><b>das ist alles Unsinn</b></grace> that is all nonsense;<br> <grace class="BoldExamples"><b>von allem etwas verstehen/wissen</b></grace> understand/know a bit about everything;<br> <grace class="BoldExamples"><b>wer alles war </b><i>od.</i> <b>wer war alles dort</b></grace> who was there?;<br> <grace class="BoldExamples"><b>wen alles habt ihr getroffen?</b></grace> who did you meet?;<br> <grace class="BoldExamples"><b>das sind alles Gauner</b></grace> they're all scoundrels;<br> <grace class="BoldExamples"><b>was gab es dort alles zu sehen?</b></grace> what was there to see?;<br> <grace class="BoldExamples"><b>was es nicht alles gibt!</b></grace> well, would you believe it!; well, I never!;<br> <grace class="BoldExamples"><b>all[es] und jedes</b></grace> everything; (<i>wahllos</i>) anything and everything;<br> <grace class="BoldExamples"><b>trotz allem</b></grace> in spite of <i>or</i> despite everything;<br> <grace class="BoldExamples"><b>sie liebt ihren Hund über alles</b></grace> she loves her dog more than anything else;<br> <grace class="BoldExamples"><b>zu allem fähig sein</b></grace> (<i>fig.</i>) be capable of anything;<br> <grace class="BoldExamples"><b>alles schon mal da gewesen</b></grace> (<i>ugs.</i>) it's all happened before;<br> <grace class="BoldExamples"><b>das kenne ich alles schon</b></grace> I've heard it all before;<br> <grace class="BoldExamples"><b>alles in allem</b></grace> all in all;<br> <grace class="BoldExamples"><b>vor allem</b></grace> above all;<br> <grace class="BoldExamples"><b>alles klar </b><i>od.</i> <b>in Ordnung</b></grace> (<i>ugs.</i>) everything's fine <i>or</i> (<i>coll.</i>) OK;<br> <grace class="BoldExamples"><b>alles klar?</b></grace> everything all right <i>or</i> (<i>coll.</i>) OK?;<br> <grace class="BoldExamples"><b>dann treffen wir uns um 5<sup>00</sup> Uhr, alles klar?</b></grace> we'll meet at 5 o'clock then, all right <i>or</i> (<i>coll.</i>) OK?;<br> <grace class="BoldExamples"><b>das ist alles</b></grace> that's all <i>or</i> (<i>coll.</i>) it;<br> <grace class="BoldExamples"><b>ist das alles?</b></grace> is that all <i>or</i> (<i>coll.</i>) it?;<br> <grace class="BoldExamples"><b>nach allem, was man hört/weiß</b></grace> to judge from everything <i>or</i> all one hears/knows; </grace><br><grace class="LetterArticle"><span class="Letter">b) </span>(<i>jeder einzelne</i>) everyone; </grace><grace class="AllExamples"><grace class="BoldExamples"><b>alle miteinander</b></grace> all together;<br> <grace class="BoldExamples"><b>ihr seid/wir sind/sie sind ..., alle miteinander</b></grace> you/we/they are ..., all of you/us/them;<br> <grace class="BoldExamples"><b>alle auf einmal</b></grace> all at once;<br> <grace class="BoldExamples"><b>sprecht nicht alle auf einmal!</b></grace> don't all speak at once;<br> <grace class="BoldExamples"><b>am besten, wir gehen alle auf einmal zum Chef</b></grace> the best thing would be for us all to go and see the boss together;<br> <grace class="BoldExamples"><b>alle, die ...</b></grace> all those who ...;<br> <grace class="BoldExamples"><b>der Kampf aller gegen alle</b></grace> unfettered competition;<br> <grace class="BoldExamples"><b>in allem einverstanden sein</b></grace> agree <i>or</i> be agreed on everything;<br> <grace class="BoldExamples"><b>von allem etwas nehmen</b></grace> take a bit of everything;<br> <grace class="BoldExamples"><b>er ist bei allem, was er tut, sehr genau</b></grace> he is very precise in everything he does;<br> <grace class="BoldExamples"><b>sie ist in allem sehr empfindlich</b></grace> she is very sensitive about everything; </grace><br><grace class="LetterArticle"><span class="Letter">c) </span>(<i>Neutr. Sg.: alle Beteiligten</i>) </grace><grace class="AllExamples"><grace class="BoldExamples"><b>alles mal herhören!</b></grace> (<i>ugs.</i>) listen everybody!; (<i>stärker befehlend</i>) everybody listen!;<br> <grace class="BoldExamples"><b>alles war nach Hause gegangen</b></grace> (<i>ugs.</i>) everyone <i>or</i> everybody had gone home;<br> <grace class="BoldExamples"><b>alles aussteigen!</b></grace> (<i>ugs.</i>) everyone <i>or</i> all out!; (<i>vom Schaffner gesagt</i>) all change!</grace><br>
</>
a, A
<link rel="stylesheet" type="text/css" href="stylesheet.css"><font size="-2">Duden-Oxford Deutsch-Englisch</font><br><grace class="SglMngArticle"><span class="WordHead"><b>a, A</b></span> <grace class="IPA">/a:/</grace> <i>das;</i> <b>a/A, a/A</b> </grace><br><br><grace class="LetterArticle"><span class="Letter">a) </span>(<i>Buchstabe</i>) a/A; </grace><grace class="AllExamples"><b>kleines a</b> small a;<br> <b>großes A</b> capital A;<br> <b>das A und O</b> (<i>fig.</i>) the essential thing/things (<i>Gen.</i> for);<br> <b>von A bis Z</b> (<i>fig. ugs.</i>) from beginning to end;<br> <b>wer A sagt, muss auch B sagen</b> (<i>fig.</i>) if one starts a thing, one must go through with it; </grace><br><grace class="LetterArticle"><span class="Letter">b) </span>(<i>Musik</i>) [key of] A</grace><br>
</>
The whole story:
I'm making a HTML based dictionary in python using BeautifulSoup and Regular Expressions. The structure of the dictionary is mainly like this:
Headword | IPA
Article 1
...Article A
......All Examples (say German example with an English explaining)
......<b>
German example</b>
......English explaining;
......<b>
German example</b>
......English <i>
explaining;</i>
......and so forth...
...Article B
......All Examples
......and so forth...
In order to arrange all of them by CSS, I have to assign CSS classes to every element (Articles, examples...) in it. I used to do all of this in pure notepad environment with Regex find-and-replace. Everything works fine except for the fact that I want to process text chunk by chunk, namely I don't want a Regex to affect more than the part I'm working with. Say the element AllExamples, I give them a whole class AllExamples
first, then give the German example and English explaining different classes and add <br>
s following those semicolons at the end of English explainations. That's not easy, because:
This can't be done with pure notepad environment with a single Regex find-and-replace. In Editpad Pro, I can match the whole AllExample class by a Regex then use a second Regex to replace
;
with;<br>
within the matched selection. It's fine if there are few instances to process, but a whole dictionary needs an one-click batch processing.The reason why I have to match an area first is that there are many equivalent patterns somewhere out of the area which I don't want to touch.
There are exceptions in the structure. Notice the second English explaining with an
i
tag at the end, that's where my Regex adding<br>
follows;
fails. So in this case, I have to replace;<i> <b>
with;<i><br> <b>
. Again, the whole AllExample class should be matched first, because of those equivalent patterns outside the area.
So BeautifulSoup is the solution, I can easily match the area with it and feed a simple .replace()
to it. Here the problem is BeautifulSoup treat tags and strings as totally different things. However in my case, the tag </i>
and <b>
needs to be matched with a ;
which is a string.
So I'm going to mix tags and strings together and then do a find-and-replace like in a notepad environment. (I know some of you guys may create a certain complicated function in python to do this, but it seems hard for me.)
Then use the .replace_with()
function to give it back to BeautifulSoup like the topic I quoted at the beginning of my post. When I do this, however, all the sharp brackets change to like >
in the resulting print. What should I do to solve this issue please?
Related topic here:
Python - Find text using beautifulSoup then replace in original soup variable