0

I have the VB code below

Sub CountWordFrequencies()
Dim InputSheet As Worksheet
Dim WordListSheet As Worksheet
Dim PuncChars As Variant, x As Variant
Dim i As Long, r As Long, b As Long
Dim txt As String
Dim wordCnt As Long
Dim AllWords As Range
Dim PC As PivotCache
Dim PT As PivotTable
Dim PF As PivotField
Application.ScreenUpdating = False
Set InputSheet = ActiveSheet
Set WordListSheet = Worksheets.Add(after:=Worksheets(Sheets.Count))
WordListSheet.Range("A1").Font.Bold = True
WordListSheet.Range("A1") = "All Words"
InputSheet.Activate
wordCnt = 2


PuncChars = Array(".", ",", ";", ":", "'", "!", "#", _
    "$", "%", "&", "(", ")", " - ", "_", "--", "+", _
    "=", "~", "/", "\", "{", "}", "[", "]", """", "?", "*")

r = 2

Dim NotRealWord As Variant
NotRealWord = Array("OF","THE")


Do While Cells(r, 1) <> ""

    txt = UCase(Cells(r, 1))

    For i = 0 To UBound(PuncChars)
        txt = Replace(txt, PuncChars(i), "")
    Next i

    txt = WorksheetFunction.Trim(txt)

    x = Split(txt)
    For i = 0 To UBound(x)
        WordListSheet.Cells(wordCnt, 1) = x(i)
        wordCnt = wordCnt + 1
    Next i
r = r + 1
Loop


WordListSheet.Activate
Set AllWords = Range("A1").CurrentRegion
Set PC = ActiveWorkbook.PivotCaches.Add _
    (SourceType:=xlDatabase, _
    SourceData:=AllWords)
Set PT = PC.CreatePivotTable _
    (TableDestination:=Range("C1"), _
    TableName:="PivotTable1")
With PT
    .AddDataField .PivotFields("All Words")
    .PivotFields("All Words").Orientation = xlRowField
    .PivotFields("All Words") _
        .AutoSort xlDescending, "Count of All Words"
End With
Set PF = ActiveSheet.PivotTables("PivotTable1").PivotFields("All Words")
With PF
    .ClearManualFilter
    .EnableMultiplePageItems = True
    For b = LBound(NotRealWord) To UBound(NotRealWord)
        .PivotItems(NotRealWord(b)).Visible = False
    Next b
End With
End Sub

This one is a Word Frequency Analysis function where the user will insert the list of strings in column A, starting from A2. They will click a button that run this script. The script will then break the strings into single words and create a pivot table that will count the frequency of each word, sorted by the frequency.

Here are the pictures showing the mechanism:

  • Input from user on column A enter image description here
  • Result enter image description here

The result

Now I have an issue with the filter. Ultimately, I want the pivot table to automatically filter out the list of words in the "NotRealWord" array because these are not useful words to analyze. My code works only when the script can find all values in the array list in the words being broken out. So in my example, I set NotRealWord = Array("OF", "THE") and the pivot table field does have these words so it works perfectly. But if I added "BY", it returns this error "Unable to get the PivotItems property of the PivotField class". How do I fix this?

Or even better, how can I make NotRealWord a dynamic array which takes the list of values in let's say column F so that the user can add in more words that they want to filter out without having to fix the code (my first picture also shows that column F).

Please note that I'm not super good at VB. I know how to read and adapt complicated codes but don't know the in and out of FB word

Long N
  • 21
  • 5

1 Answers1

2

Here's one possible approach which is a little different from your current one but should do what you want:

Sub WordCountTester()
    Dim d As Object, k, i As Long, ws As Worksheet
    
    Set ws = ActiveSheet
    Set d = WordCounts(ws.Range("A2:A" & ws.Cells(Rows.Count, "A").End(xlUp).Row), _
                       ws.Range("F2:F" & ws.Cells(Rows.Count, "F").End(xlUp).Row))
    'list words and frequencies
    For Each k In d.keys
        ws.Range("H2").Resize(1, 2).Offset(i, 0).Value = Array(k, d(k))
        i = i + 1
    Next k
End Sub

'rngTexts = range with text to be word-counted
'rngExclude = range with words to exclude from count
Public Function WordCounts(rngTexts As Range, rngExclude As Range) As Object 'dictionary
    Dim words, c As Range, dict As Object, regexp As Object, w, wd As String, m
    Set dict = CreateObject("scripting.dictionary")
    Set regexp = CreateObject("VBScript.RegExp") 'see link below for reference
    With regexp
        .Global = True
        .MultiLine = True
        .ignorecase = True
        .Pattern = "[\dA-Z-]{2,}" 'at least two characters...
     End With
     'loop over input range
     For Each c In rngTexts.Cells
        If Len(c.Value) > 0 Then
            Set words = regexp.Execute(UCase(c.Value))
            'loop over matches
            For Each w In words
                wd = w.Value 'the text of the match
                If Not IsNumeric(wd) Then  'EDIT: ignore numbers
                   'increment count if the word is not found in the "excluded" range
                    If IsError(Application.Match(wd, rngExclude, 0)) Then
                        dict(wd) = dict(wd) + 1
                    End If
                Else
                    Debug.Print "Filtered out", wd 'excluding numbers...
                End If '>1 char
            Next w
        End If
     Next c
     Set WordCounts = dict
End Function

Regular expressions reference: https://learn.microsoft.com/en-us/previous-versions/windows/internet-explorer/ie-developer/scripting-articles/ms974570(v=msdn.10)

Tim Williams
  • 154,628
  • 8
  • 97
  • 125
  • TIm, i appreciate this. Let me test this and let you know if I have any question. – Long N Feb 10 '21 at 19:46
  • Hi Tim, I was able to add "Set rngExclude = ActiveSheet.Range("D:D")" under the Public Function so that any strings under column D will be exlucded. May I ask what 'rngTexts = range with text to be word-counted is for? Without it, the function still works – Long N Feb 10 '21 at 19:54
  • 1
    You're supposed to call that function from another Sub and pass in the two ranges, as demonstrated by `WordCountTester` - you should not need to edit the function. – Tim Williams Feb 10 '21 at 19:56
  • Also, can you please help me add a logic to also not count anything with only 1 character? – Long N Feb 10 '21 at 20:03
  • Worked like a charm. Thank you so much. I already fixed Pattern = "\w+" to Pattern = "[A-Z]+" because I definitely don't want any digit. However, I want to count words that are separated by - as 1 word. So something like husband-in-law should be its own entry. Is this where I should play around more with the Pattern piece? – Long N Feb 11 '21 at 01:33
  • `.Pattern = "[A-Z-]+"` – Tim Williams Feb 11 '21 at 01:38
  • You are a life saver Tim! But sorry for bothering you 1 more thing. Now I do want to include strings that has a combination of number and digit. So I don't want to include things like 1000, 4, 8 but I do want to count 1000s, 4chan, 8-hole – Long N Feb 11 '21 at 22:26
  • I'm a bit of a wuss when it comes to anything other than basic regexp, so I'd tend towards filtering out numbers when processing the matches - see edit above – Tim Williams Feb 11 '21 at 22:44
  • Totally understood but there are things in our space that that happens a lot. Anyways, of course your fix worked. Thanks for all of the help again! – Long N Feb 12 '21 at 15:54