I am forcing myself to learn how to script solely in AppleScript but I am currently facing an issue with trying to remove a particular tag with a class. I've tried to find solid documentation and examples but at this time it seems to be very limited.
Here is the HTML I have:
<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl <span class="foo">shoulder</span> biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class="bar">Pig brisket</span> jowl ham pastrami <span class="foo">jerky</span> strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>
What I am trying to do is remove a particular class, so it would remove <span class="foo">
, result:
<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl shoulder biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class="bar">Pig brisket</span> jowl ham pastrami jerky strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>
I know how to do this with do shell script
and through the terminal but I am wanting to learn what is available through AppleScript's dictionary.
In research I was able to find a way to parse all HTML tags with:
on removeMarkupFromText(theText)
set tagDetected to false
set theCleanText to ""
repeat with a from 1 to length of theText
set theCurrentCharacter to character a of theText
if theCurrentCharacter is "<" then
set tagDetected to true
else if theCurrentCharacter is ">" then
set tagDetected to false
else if tagDetected is false then
set theCleanText to theCleanText & theCurrentCharacter as string
end if
end repeat
return theCleanText
end removeMarkupFromText
but that removes all HTML tags and that is not what I want. Searching SO I was able to find how to extract between tags with Parsing HTML source code using AppleScript but I'm not looking to parse the file.
I am familiar with BBEdit's Balance Tags
known as Balance
in the drop down but when I run:
tell application "BBEdit"
activate
find "<span class=\"foo\">" searching in text 1 of text document "test.html" options {search mode:grep, wrap around:true} with selecting match
balance tags
end tell
it turns greedy and grabs the entire line between the first tag to the second last closing tag with text in between instead of isolating itself to the first tag with it's text.
Further research in the dictionary under tag
I did run across find tag
which I could do: set spanTarget to (find tag "span" start_offset counter)
then target the tag with the class |class| of attributes of tag of spanTarget
and use balance tags
but I am still running into the same issue as before.
So in pure AppleScript how can I remove a tag associated with a class without it being greedy?