1

I need help with extracting 5-digit numbers only from one column to another in Excel 2010. These numbers can be in any position of the string (beginning of the string, anywhere in the middle, or at the end). They can be within brackets or quotes like:

(15478) or "15478" or '15478' or [15478]

I need to ignore any numbers that are less than 5 digits and include numbers that start with 1 or more leading zeros (like 00052, 00278, etc.) and ensure that leading zeros are copied over to the next column. Could someone help me with either creating a formula or UDF?

L42
  • 19,427
  • 11
  • 44
  • 68
Annette O.
  • 13
  • 1
  • 5
  • i will help because i know how annoying it can be to do this in excel , i just need to know - would there be more than 1 in a string? that wasn't made clear – Steven Martin Mar 09 '15 at 23:04
  • I think @Steven Martin answer is working fine. – shA.t Mar 10 '15 at 08:47
  • Just one - we are trying to copy Employee Numbers from Description field that contains other information like names, titles, locations, etc. to a separate column. – Annette O. Mar 10 '15 at 14:08
  • Just one - we are trying to copy Employee Numbers from Description field that contains other information like names, titles, locations, etc. to a separate column. – Annette O. Mar 10 '15 at 14:08

3 Answers3

2

Here is a formula-based alternative that will extract the first 5 digit number found in cell A1. I tend to prefer reasonably simple formula solutions over VBA in most situations as formulas are more portable. This formula is an array formula and thus must be entered with Ctrl+Shift+Enter. The idea is to split the string up into every possible 5 character chunk and test each one and return the first match.

=MID(A1,MIN(IF(NOT(ISERROR(("1"&MID(A1,ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE)),5)&".1")*1))*ISERROR(MID(A1,ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE))+5,1)*1)*ISERROR(MID(A1,ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE))-1,1)*1),ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE)),9999999999)),5)

Let's break this down. First we have an expression I used twice to return an array of numbers from 1 up to 4 less than the length of your initial text. So if you have a string of length 10 the following will return {1,2,3,4,5,6}. Hereafter the below formula will be referred to as rowlist. I used R1C1 notation to avoid potential circular references.

ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE))

Next we will use that array to split the text into an array of 5 letter chunks and test each chunk. The test being performed is to prepend a "1" and append ".1" then verify the chunk is numeric. The prepend and append eliminate the possibility of white space or decimals. We can then check the character before and the character after to make sure they are not numbers. Hereafter the below formula will be referred to as isnumarray.

NOT(ISERROR(("1"&MID(A1,rowlist,5)&".1")*1))
*ISERROR(MID(A1,rowlist+5,1)*1)
*ISERROR(MID(A1,rowlist-1,1)*1)

Next we need to find the first valid 5 digit number in the string by returning the current index from a duplicate of the rowlist formula and returning a large number for non-matches. Then we can use the MIN function to grab that first match. Hereafter the below will be referred to as minindex.

MIN(IF(isnumarray,rowlist,9999999999))

Finally we need to grab the numeric string that started at the index returned by the MIN function.

MID(A1,minindex,5)
L42
  • 19,427
  • 11
  • 44
  • 68
Mark Balhoff
  • 2,311
  • 4
  • 22
  • 30
  • Your formula also returns the first five digits of a six digit number – Ron Rosenfeld Mar 10 '15 at 00:54
  • @RonRosenfeld I gathered that he is not concerned about too many digits (only too few) based on "I need to ignore any numbers that are less than 5 digits and include numbers that start with 1 or more leading zeros (like 00052, 00278, etc.) and ensure that leading zeros are copied over to the next column." – Mark Balhoff Mar 10 '15 at 00:56
  • And I assumed that a five digit number was a five digit number, and not part of a longer string of numbers. OR it may be that his data will never have more than five digits in a row, in which case it wouldn't matter. If your assumption is correct, there is a non-array solution. – Ron Rosenfeld Mar 10 '15 at 00:58
  • @RonRosenfeld Yes the OP should clarify. But in your last assertion lies what I read into his explicit calling out of less than 5 digits despite that being a less obvious confusion point than more than 5 digits. If he does want what you suggest, the change to my formula would be minimal. Just one more condition. – Mark Balhoff Mar 10 '15 at 01:03
  • @RonRosenfeld Fixed it (Assuming OP wants that). – Mark Balhoff Mar 10 '15 at 01:40
  • I would also add that I need to extract 5 digits that come together - so in the string of "2007 Mark Jones 45238 clerk" i need the 45238 portion - not "2007 & 4" – Annette O. Mar 10 '15 at 14:18
  • Thanks! I tried the formula - given that my data starts in D2 position: =MID(D2,MIN(IF(NOT(ISERROR(("1"&MID(D2,ROW(INDIRECT("R1C[1]:R"&(LEN(D2)-4)&"C[1]",FALSE)),5)&".1")*1))*ISERROR(MID(D2,ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE))+5,1)*1)*ISERROR(MID(D2,ROW(INDIRECT("R1C[1]:R"&(LEN(D2)-4)&"C[1]",FALSE))-1,1)*1),ROW(INDIRECT("R1C[1]:R"&(LEN(D2)-4)&"C[1]",FALSE)),9999999999)),5) I'm getting #VALUE! output in the next column.. Something i did wrong? – Annette O. Mar 10 '15 at 14:37
  • @Annette Did you hit ctrl+shift+enter when leaving the formula cell instead of just enter? What is the value in D2? – Mark Balhoff Mar 10 '15 at 14:52
  • Hi! I'm not very handy with formulas -I apologize in advance. The data in in column d and i need my numbers in column E. So I clicked on E2 and pasted the formula in the formula bar at the top and hot enter. Then clicked on E2 and dragged it down 20 rows or so.. The first record (D2) has "Copyright 2007" string and shouldn't have any value in E2. Next One has "Brenda ANGELES 63215" - i would expect to see 63215 in E2.... – Annette O. Mar 10 '15 at 15:21
  • @AnnetteO. You missed an A1 --> D2 translation. You see Len(A1-4) there in the formula you posted?? That needs to be Len(D2-4). Change that and the formula will work. – Mark Balhoff Mar 10 '15 at 15:21
  • @AnnetteO. `=MID(D2,MIN(IF(NOT(ISERROR(("1"&MID(D2,ROW(INDIRECT("R1C[1]:R"&(LEN(D2)-4)&"C[1]",FALSE)),5)&".1")*1))*ISERROR(MID(D2,ROW(INDIRECT("R1C[1]:R"&(LEN(D2)-4)&"C[1]",FALSE))+5,1)*1)*ISERROR(MID(D2,ROW(INDIRECT("R1C[1]:R"&(LEN(D2)-4)&"C[1]",FALSE))-1,1)*1),ROW(INDIRECT("R1C[1]:R"&(LEN(D2)-4)&"C[1]",FALSE)),9999999999)),5)` – Mark Balhoff Mar 10 '15 at 15:30
  • Thanks! Missed this one.. still getting the #VALUE! in all column E cells...Here is a small set of the data i'm trying to retrieve the numbers from: – Annette O. Mar 10 '15 at 16:15
  • cccccc cccccccccc (19416) ddddddddd dddd 32554 eeeeeee (51503) eeeeeee (15139) fff ffffff 0123 ggggg hhhhhh (52416) hhhhhhh hhhhhhhh (52407) kkkkkk kkkkkk 37410 kkkkkk kkkkk (MG23) rrrrr rrrrr 49933 lllll lllll x5236 vvvvv vvvvv (55266) zzzzz zzzz(45828) yyyy yyyyy 37995 – Annette O. Mar 10 '15 at 16:18
  • my last comment came out as one long string.. For the most part, each entry end with a number whether is it EMPLID, extension number, location code or POS number.. – Annette O. Mar 10 '15 at 16:21
  • @AnnetteO. I'll take a look when i get back from lunch. – Mark Balhoff Mar 10 '15 at 16:32
  • @AnnetteO. If you aren't hitting Ctrl+Shift+Enter when finished pasting in the formula, you would get #VALUE. Are you absolutely sure you are hitting Ctrl+Shift+Enter to leave edit mode of E2 and not just Enter ??? – Mark Balhoff Mar 10 '15 at 17:20
2

The following UDF will return the first five digit number in the string, including any leading zero's. If you need to detect if there is more than one five digit number, the modifications are trivial. It will return a #VALUE! error if there are no five-digit numbers.

Option Explicit
Function FiveDigit(S As String, Optional index As Long = 0) As String
    Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
    .Pattern = "(?:\b|\D)(\d{5})(?:\b|\D)"
    .Global = True
        FiveDigit = .Execute(S)(index).submatches(0)
End With
End Function

As you may see from the discussion between Mark and myself, some of your specifications are unclear. But if you would want to exclude decimal numbers, when the decimal portion has five digits, then the regex pattern in my code above should be changed:

.Pattern = "(?:\d+\.\d+)|(?:\b|\D)(\d{5})(?:\b|\D)"
Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • Both of your formulas will find strings like "0.015" and "-1500". Neither is valid – Mark Balhoff Mar 10 '15 at 01:48
  • That's true. Good pickup. The regex solution is more robust, and simpler. I'll delete the formulas. – Ron Rosenfeld Mar 10 '15 at 01:49
  • @MarkBalhoff Oh, and both your formulas, and my regex, will return the 23456 in 1.23456. – Ron Rosenfeld Mar 10 '15 at 01:53
  • Hmm... true. Good point. Which begs the question: Should "dfskf .32322" return 32322 or treat it as 0.32322 and return an error? – Mark Balhoff Mar 10 '15 at 02:08
  • I don't know. Obviously we're just playing around until we get clarification from the OP – Ron Rosenfeld Mar 10 '15 at 02:12
  • I don't think we have any decimal numbers in that field so it won't be a problem. EmployeeID are 5 digit numbers - which we need. But there might be POS number or Location code or year created in the data that I should ignore - these are less than 5 digits – Annette O. Mar 10 '15 at 14:15
0

I just wrote this UDF for you , basic but will do it...

It will find the first 5 consecutive numbers in a string, very crude error checking so it just says Error if anything isn't right

Public Function GET5DIGITS(value As String) As String
    Dim sResult As String
    Dim iLen As Integer
    sResult = ""
    iLen = 0

    For i = 1 To Len(value)       
        If IsNumeric(Mid(value, i, 1)) Then
            sResult = sResult & Mid(value, i, 1)
            iLen = iLen + 1
        Else
            sResult = ""
            iLen = 0
        End If
        If iLen = 5 Then Exit For
    Next

    If iLen = 5 Then
        GET5DIGITS = Format(sResult, "00000")
    Else
        GET5DIGITS = "Error"
    End If
End Function
Steven Martin
  • 3,150
  • 1
  • 20
  • 27
  • Your UDF will also return the first five digits of a six digit number – Ron Rosenfeld Mar 10 '15 at 00:54
  • Yes it does , I have written this , first 5 consecutive if the OP responds to my question I can modify – Steven Martin Mar 10 '15 at 04:48
  • 1
    Thank you so much for helping! I would also add that I need to extract 5 digits that come together with no spaces or other characters in between - so in the string of "2007 Mark Jones 45238 clerk" i need the 45238 portion - not "2007 & 4" – Annette O. Mar 10 '15 at 14:22