0

I have a code that has a variable range with many categories in each column that display data. I need to highlight the least commonly occurring values as a percentage of the total number of cells.

If there are 300 cells in the column, it needs to find the value (out of many possibly repeating values) that occurs least frequently. It is a bonus if the code can anticipate the total number, and give only 5% or 10% of the entire column as a result.

Currently my attempt is to use a function in the top cell that will find the least common occurrence, and the code will simply highlight whatever that value is down the cell as it repeats (and highlight every one of the least common ones.

The difficulty I am having is twofold.

  1. There may be more than one least common value that is still below 10% of the total values
  2. The ability to automate this search so that it may be performed and highlighted for all of more than 100 columns with different categories and different values in each column

If too vague, feel free to ask questions about what I am going for, and I will respond promptly.

This is what the data looks like. As you can see there are merged titles for each column with various blank spaces and sperratically placed data that matches some specific column. enter image description here

This is the proposed code which is still not highlighting what I would like it to. It has two problems. 1: It will highlight ALL of the data in one range if there is no differing value in the row. 2: It will highlight the titles of the columns. enter image description here

This is the highlighted data which is still insufficiently complete. enter image description here

In some cases the column truely do not match the purpose of the code, for example in one column, the number 12 was highlighted down the column (67 occurances) where there are fewer occurances of other numbers. (8 occurs 29 times and is not highlighted)

Community
  • 1
  • 1
Coding Novice
  • 437
  • 2
  • 8
  • 22
  • I am trying to wrap my head around this but don't succeed. YOu have code? Values in Rows? Categories in Columns? Which of these is variable? Does the number of columns per row vary? What do you mean with categories and what do you mean by "range" of a "code"? Could you possibly try to put down some sample data in your post as well as what excatly you would like to count and where the percenatge shall be displayed? – LocEngineer Jun 24 '15 at 14:49
  • you are referring to both rows and columns in a confusing way, especially in the second paragraph. Is the word "row" a typo there? – John Coleman Jun 24 '15 at 14:50
  • Here is the best I can do. There are 100ish categories that are labeled at the top of each column. Each column is an individual category. For example, one column might have the label "Pixel Size." Then underneath there will be cells that belong to that category. For pixel size the possible cells are 1: some various numbers. (ex: 24, 48, 92.. etc) 2: "Custom". 3: Blank cells – Coding Novice Jun 24 '15 at 15:05
  • if out of 100 rows 24 occurs 60 times, 28 occurs 20 times, and 92 occurs 10 times, and there are 6 "Custom" cells and 4 blank cells. I want the code to highlight the 6 "custom cells and possibly the 10 "92" cells depending on how much the total number of cells is (maybe the code will only pull show the bottom 10%. Does this make sense? – Coding Novice Jun 24 '15 at 15:05
  • Sorry about the confusion. I myself am also having trouble to describe what I want this code to do. I just can see the end result – Coding Novice Jun 24 '15 at 15:07
  • I would use a VBScript dictionary object (accessible in VBA if your project includes a reference to the Microsoft Scripting Runtime). Dictionaries are much better for maintaining frequency counts than any native VBA object. This seems especially important in your case since it seems that the values are heterogeneous (e.g. a mixture of integers and strings). – John Coleman Jun 24 '15 at 15:18
  • Why not just generate a list of the codes, and use COUNTIF with the code as the criteria. – Ron Rosenfeld Jun 24 '15 at 15:52

1 Answers1

1

I just hacked together a seemingly working example. Try this here:

Sub frequenz()
Dim col As Range, cel As Range
Dim letter As String
Dim lookFor As String
Dim frequency As Long, totalRows As Long
Dim relFrequency As Double
Dim ran As Range

ran = ActiveSheet.Range("A1:ZZ65535")
totalRows = 65535

For Each col In ran.Columns
    '***get column letter***
    letter = Split(ActiveSheet.Cells(1, col.Column).Address, "$")(1)
    '*******
    For Each cel In col.Cells
        lookFor = cel.Text
        frequency = Application.WorksheetFunction.CountIf(Range(letter & "2:" & letter & totalRows), lookFor)
        relFrequency = frequency / totalRows

        If relFrequency <= 0.001 Then
            cel.Interior.Color = ColorConstants.vbYellow
        End If
    Next cel

Next col

End Sub

It seemed to be doing just what you are looking for.

Edit: fixed the address getting.

LocEngineer
  • 2,847
  • 1
  • 16
  • 28
  • I am going to check on this right now. Ill get back asap as to whether or not this is a step in the right direction! Thank you! – Coding Novice Jun 26 '15 at 12:15
  • So the problem i am having running this is that it looks like perhaps the opposite effect is happening, everything turns yellow on the run except for the first value in the leading end of some of the rows. Odd... I do think this code is in the right direction, I will try to putz around with it to see what I can do. – Coding Novice Jun 26 '15 at 12:21
  • Funny. `col.Cells(1, col.Column).Address` seems to return a wrong address; therefore the lookup happens in the wrong column! Fixed it. Should work better now. – LocEngineer Jun 26 '15 at 12:43
  • So I am still having trouble, but I cant figure out why. It looks like to me that the code you have posted is highlighting by row rather than by column because one column completely filled with "False" is being highlighted. I am attempting to amend the code to inverse the scan direction if that is the problem. I wish I could share a picture of what the scanned data looks like. For the most part the yellow is highlighting entire columns to look like vertical bars. – Coding Novice Jun 26 '15 at 15:37
  • Single step through it to see what each variable holds at a given point in time. It seems that your data is organised slightly different to what I thought it would be. Some visual sample might be helpful. – LocEngineer Jun 27 '15 at 10:27
  • Okay, so I reviewed the data, and the code you made does in fact work very well. It still has a couple of problems for my use. It highlights titles (the actual data starts on row 7 and I cant seem to edit it to match. Secondly, rows that have no deviation in the data get completely highlighted. – Coding Novice Jun 29 '15 at 15:03
  • Also in some cases the lowest frequency values down a column are not being highlighted, I cant explain why. – Coding Novice Jun 29 '15 at 15:13
  • Holy smoking moly, Batman! If THAT truly is your sheet.... I'd say: forget "UsedRange". That won't work well enough with THAT spread... I've edited the above code using more hardcoded values. Please adapt the values according to your needs and try that. Woah what a mess. Thank god it's just a thumbnail pic up there... :-P – LocEngineer Jun 29 '15 at 15:51