1

the below code extract all numbers from string and even combine them.
But I need to extract only one whole number with rules:
1- the number is one or two digits (plus the decimal part if it exsists).
2- if the number is followed by " or inch or in , then extract it and ignore rest of numbers in string.
3- if the above condition (2) is not found, then extract the first numbers and ignore rest of numbers in string.

Current String Expected Result
INSPECT-8'' Water 12 Pipe 8
INSPECT- 8.5" Water 12 8.5
INSPECT- 4 Water 5.5" 5.5
PM- 6.5 inch From H44 6.5
PM-36in Pipe M1T 36
PM-36 Pipe From M2T 36
PM-18"*12" Pipe From M1T 18
PM-36 From 5" M1T 5
PM-123 Pipe From MT
Public Function GetNumeric(CellRef As String)
    Dim StringLength As Long, i As Long, Result As Variant
    StringLength = Len(CellRef)
    For i = 1 To StringLength
      If IsNumeric(Mid(CellRef, i, 1)) Then
         Result = Result & Mid(CellRef, i, 1)
      End If
    Next i
    GetNumeric = Result
End Function
Waleed
  • 847
  • 1
  • 4
  • 18

4 Answers4

6

Maybe create your own UDF making use of a regular expression. Perhaps something like:

Public Function RegexExtract(str, pat, Optional gFlag As Boolean = False, Optional pos As Integer = 0, Optional cse as Boolean = True) As String

Static RE As Object: If RE Is Nothing Then Set RE = CreateObject("vbscript.regexp")

RE.Pattern = pat
RE.Global = gFlag
RE.IgnoreCase = cse

If RE.Test(str) Then
    RegexExtract = RE.Execute(str)(pos)
Else
    RegexExtract = vbNullString
End If

End Function

Note that I created an optional global flag that is false by default which should just pull the very 1st hit in a cell. The optional pos variable is there to return a certain match in case you wish the somehow return other numbers when you set the global flag to true. Also note the use of a case flag set to true to match case-insensitive by default.

You can call the above like:

=IFERROR(--RegexExtract(A1,"\b\d\d?(?!\d)(?=\s*(?:""|''|in(?:ch)?\b)?)"),"")

The pattern used stands for:

  • \b\d\d? - A word-boundary with 1 digit and a 2nd optional one;
  • (?!\d) - A negative lookahead to assert no more digits;
  • (?=\s*(?:"|''|in(?:ch)?\b)?) - A positive lookahead to assert position is followed by 0+ (greedy) whitespace characters and:
    • " - A double quote, or;
    • '' - Two single quotes, or;
    • in(?:ch)?\b - Literally 'in' followed by optional 'ch' and a word-boundary to confirm the letters are not part of a larger substring to prevent false positives.

EDIT1:

As per OP's comments below; there are case where there could be a number of interest that is not at the 1st position. Since OP allows for a number without inches to also be matched, the addition here is to include a negative lookahead that will assert that there is no 2nd occurence of the valid pattern:

\b\d\d?(?!\d|.*\b\d+\s*(?:""|''|in(?:ch)?\b))(?=\s*(?:""|''|in(?:ch)?\b)?)

I suppose this is implicitly the same as:

\b\d\d?(?!\d|.*\b\d+\s*(?:""|''|in(?:ch)?\b))

EDIT2:

To allow for decimals you could include an optional non-capture group:

=IFERROR(--RegexExtract(A2,"\b\d\d?(?:\.\d+)?(?!\d|.*\b\d+\s*(?:""|''|in(?:ch)?\b))"),"")
JvdV
  • 70,606
  • 8
  • 39
  • 70
  • I tried also as `=RegexExtract(A1,"\b\d\d?(?!\d)(?=\s*(?:""|''|in(?:ch)?\b)?)")` and it works perfectly , so what is the need to `IFERROR` ? – Waleed Apr 11 '23 at 12:02
  • I played with all optional arguments and set `Optional gFlag As Boolean = True, Optional pos As Integer = 1`, But the output of this strings are not correct `1 Water 3" Pipe` and `4" Water 5` , the correct result should be `(3 , 4 )` respectively , but now it is `(3 , 5)` – Waleed Apr 11 '23 at 12:17
  • @waleed , then you misunderstood the variables. You should keep the optional default values to retrieve the right number. – JvdV Apr 11 '23 at 12:27
  • Yes I have kept the optional default values , but the output is incorrect result with this sample `(1 Water 3" Pipe)` it gives **1** and it should **3** – Waleed Apr 11 '23 at 12:39
  • 1
    Ah I see what you mean. This is inherent to my first assumption that you'd want to return the very 1st valid result. This would be correct with current pattern. You'd need to adjust the pattern to cater for your current sample. Something I can do later tonight when I'm back behind a pc. – JvdV Apr 11 '23 at 13:32
2

I think the answer already lays in your proposed requirements You need to added if statement as checks to do this or a case statement, I recommend defining an extra variable to check before you added to results. Define this as the character of your string you are looping through. Example:

Dim Check as string

Check = Mid(CellRef, i, 1)
If Check =  Chr(34) then GetNumeric = Result    

Dealing with multiple numbers without further definition will yield situations that are not solvable e.g you second to last example has no indication if the 1 or 36 is at the correct one.

Waly
  • 46
  • 5
  • Please could you add the full code, as it does not work with me? – Waleed Apr 11 '23 at 13:50
  • Sorry but this is just an idea for futher developement. You shoudl do a check for each of the desired indicators of length. And as I and other here stated. You will find problems for data which is not specific enough to extract the desired outcome – Waly Apr 11 '23 at 16:29
2

Here are two solutions which do not require Regex.

The first solution applies a series of transforms to the string so that we can use Split to get a sequence of strings, some of which will be numbers. The issue here is choosing the correct transforms so that Split can be applied to isolate numbers. Sometime this may not be possible.

The second solution just parses the string until it has extracted a sequence of characters that are numeric and then returns that numeric string for further processing. This is likely the best solution in your case.

Note that neither solution has been tested for edge cases.

Given that you are trying to parse what appears to be freeform text there could be lots of edge cases.

Sub Test()

    Dim myC As Collection
    Set myC = New Collection
    With myC
    
        .Add "INSPECT - 8" & Chr$(34) & " Water 12 Pipe    8"
        .Add "INSPECT- 18" & Chr$(34) & " Water 12   18"
        .Add "PM-6in Pipe From M37 st 6"
        .Add "PM- 6 inch Pipe From H44    6"
        .Add "PM-36 Pipe From M-1T    36"
        .Add "PM-123 Pipe From MT"
        
    End With
    

    Dim myItem As Variant
    Dim myNumber As Long
    For Each myItem In myC
    
    'Option 1
'        If TryGetFirstNumber(myItem, myNumber) Then
'            Debug.Print myNumber
'        End If
     ' option 2
'        Debug.Print ParseFirstNumber(VBA.CStr(myItem))
    Next
    
End Sub

' Pass ByVal so we don't alter the original string
Public Function ApplyTransforms(ByVal ipString As String) As String

    ipString = VBA.LCase(ipString)
    ipString = VBA.Replace(ipString, "-", " ")
    ipString = VBA.Replace(ipString, VBA.Chr(34), " ")
    ipString = VBA.Replace(ipString, "in ", " ")
    ipString = VBA.Replace(ipString, "inch ", " ")
    
    ApplyTransforms = ipString
    
End Function

'The try function indicates success by the returned boolean value, the result of the success is returned Byref in parameter opNumber
Public Function TryGetFirstNumber(ByRef ipString As Variant, ByRef opNumber As Long, Optional ipLength As Long = 2) As Boolean

    
    Dim myArray As Variant
    myArray = VBA.Split(ApplyTransforms(ipString))
    
    Dim myItem As Variant
    For Each myItem In myArray
    
        If VBA.IsNumeric(myItem) Then
        
            If VBA.Len(myItem) <= ipLength Then
            
                opNumber = VBA.CLng(myItem)
                TryGetFirstNumber = True
                Exit Function
                
            End If
            
        End If
            
    Next
    
    TryGetFirstNumber = False
    
End Function


Public Function ParseFirstNumber(ByRef ipString As String) As String

    Dim myIndex As Long
    myIndex = 1
    
    Dim myLen As Long
    myLen = VBA.Len(ipString)
    
    Dim myNumber As String
    myNumber = vbNullString

    Do While myIndex <= myLen
        If VBA.InStr("0123456789", VBA.Mid$(ipString, myIndex, 1)) > 0 Then
            Exit Do
        End If
        myIndex = myIndex + 1
    Loop

    If myIndex > myLen Then
        ParseFirstNumber = myNumber
        Exit Function
    End If

    Do While VBA.InStr("0123456789", VBA.Mid$(ipString, myIndex, 1)) > 0
        myNumber = myNumber & VBA.Mid$(ipString, myIndex, 1)
        myIndex = myIndex + 1
        If myIndex > myLen Then
            ParseFirstNumber = myNumber
            Exit Function
        End If
    Loop
    
    ParseFirstNumber = myNumber
    
End Function
freeflow
  • 4,129
  • 3
  • 10
  • 18
  • I tried The second solution, it works with all my samples except the last one `"PM-123 Pipe From MT`" the result should be blank as I required the number to be one or two characters. – Waleed Apr 11 '23 at 11:57
  • What do you think the term 'further processing is required' means? – freeflow Apr 11 '23 at 12:28
  • ok understood, I added `if` with `Len` , the problem now that it extract the first hit regardless the position of it. – Waleed Apr 11 '23 at 12:33
  • Correct. None of your examples showed that edge case. So as before, further processing may be required. – freeflow Apr 11 '23 at 12:47
2

Please, (also) test the next way. It uses standard VBA and an array to be processed. The processed array content will be dropped at the end of the code, so, it should be very fast even for large ranges. It assumes that the range to be processed starts from "A1", headers existing on the sheet first row:

Sub extractInchesNoFromAllRange()
   Dim sh As Worksheet, lastR As Long, arr, i As Long
   Dim dblQ As Long, sQ As Long, strIn As Long, No As String
   
   Set sh = ActiveSheet
   lastR = sh.Range("A" & sh.rows.count).End(xlUp).row
   
   arr = sh.Range("A2:B" & lastR).Value2
   
   For i = 1 To UBound(arr)
        dblQ = InStr(arr(i, 1), """") 'check if a double quote caracters exists and return its postition if it does
        sQ = InStr(arr(i, 1), "''")      'check if two simple quote caracters exists and return its postition if it does
        strIn = InStr(arr(i, 1), "in") 'the same as above for "in" string
        
        No = "" 'reinitialize the variable to keep the extracted number (as string...)
        If dblQ > 0 Or sQ > 0 Then 'if doble quote exists:
            If IsNumeric(Mid(arr(i, 1), IIf(dblQ > 0, dblQ, sQ) - 1, 1)) Then 'if a number exists before the quote ch
                No = Mid(arr(i, 1), IIf(dblQ > 0, dblQ, sQ) - 1, 1)                      'extract first digit
                arr(i, 2) = extractNo(IIf(dblQ > 0, dblQ, sQ) - 2, CStr(arr(i, 1)), No, True) 'call the function which make extraction by (backward) iteration
            End If
        ElseIf strIn > 0 Then 'if "in" exists:
            If Mid(arr(i, 1), strIn + 2, 1) = " " Or Mid(arr(i, 1), strIn + 2, 2) = "ch" Or strIn + 1 = Len(arr(i, 1)) Then
                If Mid(arr(i, 1), strIn - 1, 1) = " " Then
                    arr(i, 2) = extractNo(strIn - 2, CStr(arr(i, 1)), No, True)
                Else
                    arr(i, 2) = extractNo(strIn - 1, CStr(arr(i, 1)), No, True)
                End If
            End If
        Else
            arr(i, 2) = extractNo(0, CStr(arr(i, 1)), "")
        End If
   Next i
   
   'drop the processed arran content back in its range:
   sh.Range("A2").Resize(UBound(arr), UBound(arr, 2)).Value2 = arr
End Sub

Function extractNo(pos As Long, str As String, No As String, Optional boolChar = False) As Variant
   Dim i As Long, boolNo As Boolean
   
   On Error GoTo WrongPatt
   If boolChar Then 'if one of the searched characters has been found:
        For i = pos To 1 Step -1
            If IsNumeric(Mid(str, i, 1)) Or Mid(str, i, 1) = "." Then
                No = CStr(Mid(str, i, 1)) & No
            Else
                extractNo = CDbl(No): Exit For
            End If
        Next i
  Else 'if no searched string has been found:
        For i = 1 To Len(str)
            If IsNumeric(Mid(str, i, 1)) Then
                boolNo = True
                No = No & Mid(str, i, 1)
            Else
                If boolNo Then Exit For
            End If
        Next i
        If Len(No) <= 2 And No <> "" Then
                extractNo = CLng(No)
        Else
                extractNo = ""
        End If
  End If
  Exit Function
  
WrongPatt:
   extractNo = "Wrong pttern"
End Function

But the above code will process only the pattern string you show in your question. If, for instance, there will be more double quotes characters, with a different purpose **before the one having a number in front of it), the code will process only the first found. It may have problems if the searched strings are the first in the string and so on... It can be adapted to deal with more conditions, but we here are not the mind readers to cover such not shown cases...

FaneDuru
  • 38,298
  • 4
  • 19
  • 27
  • Ok I will remember what I have wrote yesterday evening. I tried your last edit and it works perfectly , I hope if ST allows two answers to be accepted at the same time. from my side, I adapted your code a little to use an additional array to hold the extracted numbers ( in fact I borrowed that idea from your other answers ) – Waleed Apr 13 '23 at 06:18
  • @Waleed Glad I could help! Yes, I am surprised related to comments deletion. Using another array is a better idea if the range to be processed contains a lot of columns, but for only two, a unique array looks better to me. The conde efficiency is rather similar for both cases. – FaneDuru Apr 13 '23 at 06:35
  • Please, are you fine? I did not see any answer by you from a long time. – Waleed Apr 30 '23 at 09:07
  • 1
    @Waleed I'm OK. I am involved in a project which does not let me much time a availability. And last week I spent my vacation in Tunisia. – FaneDuru May 01 '23 at 16:23