Given that the 10-digit number is the first numeric part in every line of the file (let us call it numbers.txt
) before any other numbers, you could use the following:
@echo off
setlocal EnableExtensions EnableDelayedExpansion
rem // Define constants here:
set "_FILE=.\numbers.txt"
set /A "_DIG=10"
rem // The first delimiter is TAB, the last one is SPACE:
for /F "usebackq tokens=1 delims= ^!#$%%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^^_`abcdefghijklmnopqrstuvwxyz{|}~ " %%L in ("!_FILE!") do (
set "NUM=%%L#"
if "!NUM:~%_DIG%!"=="#" echo(%%L
)
endlocal
exit /B
This makes use of for /F
and its delims
option string, which includes most ASCII characters except numerals. You may extend the delims
option string to hold also extended characters (those with a code greater than 0x7F
); make sure the SPACE is the last character specified.
This approach can extract the 10-digit number from a line like this:
garbage text>0123456789_more text0123-end
But it fails if a line looks like this, so when the first number is not the 10-digit one:
garbage text: 0123 tel. 0123456789; end
Here is a comprehensive solution based on the above approach. The character list for the delims
option of for /F
is created automatically here. This may take even a few seconds, but this is done once only at the very beginning, so for large files you will probably not recognise this overhead:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=.\numbers.txt"
set /A "_DIG=10"
rem // Define global variables here:
set "$CHARS="
rem // Capture current code page and set Windows default one:
for /F "tokens=2 delims=:" %%P in ('chcp') do set /A "CP=%%P"
> nul chcp 437
rem /* Generate list of escaped characters other than numerals (escaped means every character
rem is preceded by `^`); there are some characters excluded:
rem - NUL (this cannot be stored in an environment variable and should not occur anyway),
rem - CR + LF, (they build up line-breaks, so they cannot occur within a line obviously),
rem - SPACE, (because this must be placed as the last character of the `delims`option),
rem - `"`, (because this impairs the quotation within the following code portion),
rem - `!` + `^` (they may lead to unexpected results when delayed expansion is enabled): */
setlocal EnableDelayedExpansion
for /L %%I in (0x01,1,0xFF) do (
rem // Exclude codes of aforementioned characters:
if %%I GEQ 0x30 if %%I LSS 0x3A (set "SKIP=#") else (set "SKIP=")
if not defined SKIP if %%I NEQ 0x00 if %%I NEQ 0x0A if %%I NEQ 0x0D (
if %%I NEQ 0x20 if %%I NEQ 0x21 if %%I NEQ 0x22 if %%I NEQ 0x5E (
rem // Convert code to character and append to list separated by `^`:
cmd /C exit %%I
for /F delims^=^ eol^= %%J in ('
forfiles /P "%~dp0." /M "%~nx0" /C "cmd /C echo 0x220x!=ExitCode:~-2!0x22"
') do (
set "$CHARS=!$CHARS!^^%%~J"
)
)
)
)
endlocal & set "$CHARS=%$CHARS%"
rem /* Apply escaped list of characters as delimiters and apply some of the characters
rem excluded before, namely SPACE, `"`, `!` and `^`;
rem read file using `type` in order to convert from Unicode, if applicable: */
for /F tokens^=1*^ eol^=^ ^ delims^=^!^"^^%$CHARS%^ %%K in ('type "%_FILE%"') do (
set "NUM=%%K#" & set "REST=%%L"
rem // Test whether extracted numeric string holds the given number of digits:
setlocal EnableDelayedExpansion
if "!NUM:~%_DIG%!"=="#" echo(%%K
endlocal
rem /* Current line holds more than a single numeric portion, so process them in a
rem sub-routine; this is not called if the line contains a single number only: */
if defined REST call :SUB REST
)
rem // Restore previous code page:
> nul chcp %CP%
endlocal
exit /B
:SUB ref_string
setlocal DisableDelayedExpansion
setlocal EnableDelayedExpansion
set "STR=!%~1!"
rem // Parse line string using the same approach as in the main routine:
:LOOP
if defined STR (
for /F tokens^=1*^ eol^=^ ^ delims^=^^^!^"^^^^%$CHARS%^ %%E in ("!STR!") do (
endlocal
set "NUM=%%E#" & set "STR=%%F"
setlocal EnableDelayedExpansion
rem // Test whether extracted numeric string holds the given number of digits:
if "!NUM:~%_DIG%!"=="#" echo(%%E
)
rem // Loop back if there are still more numeric parts encountered:
goto :LOOP
)
endlocal
endlocal
exit /B
This approach detects 10-digit numbers everywhere in the file, even if there are multiple ones within a single line.