0

I'm trying to write a VBA script that goes through a column of cells and one, underlines text between html <u></u> tags and two, removes those tags from the text afterwards. Cells may have multiple tags inside of them, other text next two them, or no tags at all.

So far I've been able to get the script to underline between tags but when I try to remove them nothing really works (sometimes nothing is changed, sometimes the tag is underlined, etc.). I'm omitting input/output examples for brevity and in the hopes that there are glaringly obvious issues with my code but they are available on request.

Trying to solve this problem using VBA initially stems from an inability for me to do this in Python since the object model only goes as low as cells, not the content of the cell. Any solutions using Python to do this would also be appreciated!

Thank you so much for the help! Let me know if there's anything else I can do to help you all!

Sub PleaseUnderline()
'Holds the content between the tags
Dim s As String
'Holds the row number of the active cell
Dim a As Integer
'Holds the location of the beginning of the open tag
Dim b As Integer
'Holds the location of the beginning of the close tag
Dim e As Integer
Dim holder As String
    'Select the last cell in column A and make it the active cell
    Range("A" & ActiveCell.SpecialCells(xlLastCell).Row).Select
    For a = ActiveCell.Row To 1 Step -1
        Range("A" & a).Select
        holder = Range("A" & a).Value
        s = ""
        b = 1
        e = 1
        Do
            b = InStr(b, ActiveCell, "<u>")
            If b = 0 Then Exit Do
            e = b + 1
            e = InStr(e, ActiveCell, "</u>")
            If e = 0 Then
                Exit Do
            Else
                s = Mid(ActiveCell, b + 3, e - b - 3)
            End If
            holder = Replace(holder, "<u>", "", 1, 1)
            holder = Replace(holder, "</u>", "", 1, 1)
            Worksheets("Sheet").Range("A" & a).Value = holder
            ActiveCell.Characters(b, Len(s)).Font.Underline = True
            b = e + 1
        Loop
    Next a
End Sub
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

2 Answers2

0

Slight modifications, but this worked for me. I believe the issue is that you were adding 3 to your starting point (b + 3), where you don't need to as you're already removing the <u> from in front of it, so no need to offset by 3 characters.

Sub PleaseUnderline()

Dim i As Long, j As Long
Dim startpoint As Long, endpoint As Long
Dim holder As String

For i = 2 To Cells(Rows.Count, 1).End(xlUp).Row

    If InStr(Cells(i, 1).Value, "<u>") > 0 Then
        For j = 1 To Len(Cells(i, 1).Value)
            If Mid(Cells(i, 1).Value, j, 3) = "<u>" Then
                startpoint = j
            End If

            If Mid(Cells(i, 1).Value, j, 4) = "</u>" Then
                endpoint = j
            End If
        Next j

        holder = Cells(i, 1).Value
        holder = Replace(holder, "<u>", "")
        holder = Replace(holder, "</u>", "")
        Cells(i, 1).Value = holder
        Cells(i, 1).Characters(startpoint, endpoint - startpoint - 3).Font.Underline = True

    End If

Next i

End Sub

img1

dwirony
  • 5,487
  • 3
  • 21
  • 43
0

This worked for me:

Sub Tester()
    DoTags ActiveSheet.Range("A1")
End Sub

Sub DoTags(c As Range)

    Dim s As Long, e As Long, l As Long, arrTags, tag

    arrTags = Array("b", "i", "u")

    For Each tag In arrTags

        Positions c.Value, tag, s, e

        Do While s > 0 And e > 0
            With c.Characters(s + Len(tag) + 2, e - s).Font
                Select Case LCase(tag)
                    Case "u": .Underline = True
                    Case "b": .Bold = True
                    Case "i": .Italic = True
                End Select
            End With
            c.Characters(e, Len(tag) + 3).Delete '<<delete end tag first...
            c.Characters(s, Len(tag) + 2).Delete
            Positions c.Value, tag, s, e
        Loop

    Next tag
End Sub

'set start and end positions of a tag in a string
Sub Positions(txt As String, tag, ByRef s As Long, ByRef e As Long)
    e = 0: s = 0
    s = InStr(1, txt, "<" & tag & ">", vbTextCompare)
    If s > 0 Then e = InStr(s, txt, "</" & tag & ">", vbTextCompare)
End Sub

EDIT: since some of your content seems like it might be too long for the approach above, here's an alternative method (generic HTML >> formatted text conversion)

Sub Tester()
    Dim c As Range
    For Each c In ActiveSheet.Range("A2:C2").Cells
        HTMLtoFormattedText c
    Next c
End Sub

Private Sub HTMLtoFormattedText(c As Range)

    Dim objData As DataObject 'reference to "Microsoft Forms 2.0 Object Library"
    Set objData = New DataObject

    objData.SetText "<HTML>" & c.Text & "</HTML>"
    objData.PutInClipboard

    c.Parent.Activate
    c.Offset(1, 0).Select
    c.Parent.PasteSpecial Format:="Unicode Text"

End Sub
Tim Williams
  • 154,628
  • 8
  • 97
  • 125
  • Works great for most of my document but fails with this error: https://snag.gy/JKCSnN.jpg Contents of the first cell that triggers this error is: "1 wǒ, ní féi , chū shēng..." Any ideas as to how to fix it? – Boyd Christiansen Nov 01 '18 at 01:41
  • I only tested with simple ascii text, so maybe something to do with double-byte characters? – Tim Williams Nov 01 '18 at 02:40
  • Maybe? The rest of my sheet is in Chinese and it was working just fine with those cells though. Stepping through, it looks like it actually successfully underlines before throwing this error though. Any other thoughts? '^_^ Thanks for your time by the way! – Boyd Christiansen Nov 01 '18 at 03:28
  • Can you share a file with some sample content which triggers that error? I can take a look. – Tim Williams Nov 01 '18 at 03:56
  • Sure! Here it is: https://drive.google.com/file/d/1EcvTPdpdUPE1ta-uuPSqKM0-_2WyYuwb/view?usp=sharing – Boyd Christiansen Nov 01 '18 at 07:51
  • The issue is the length of the content in the "problem" cell - anything over 255 and you can't really use the Characters collection effectively. – Tim Williams Nov 01 '18 at 16:57