0

I need to be able to extract a portion of a text file that occurs between two blank lines. The text file looks something like this...

This is line 01 of the text file.
This is line 02 of the text file.
This is line 03 of the text file.

This is line 05 of the text file.
This is line 06 of the text file.
This is line 07 of the text file.
     > VALUE TO SEARCH <
This is line 09 of the text file.
This is line 10 of the text file.

This is line 12 of the text file.
This is line 13 of the text file.

So, I can search and locate the "> VALUE TO SEARCH <" within the text file, but then I need to be able to grab everything up to the preceding blank line and everything down to the trailing blank line of that one section. Does that make sense? Anyway, the number of lines per section varies, but there is always a single blank line between sections.

Can this be done via batch file? If so, how?

Thanks in advance!

user3208239
  • 109
  • 1
  • You should specify on what platform are you working on. The solution may depend on that e.g. Linux => Python, Perl, Bash Windows=>VBScript – Mauricio López Nov 05 '15 at 19:54
  • Sorry...thought the "batch" was enough. OS is Windows XP, using a .BAT file to accomplish the task. Thanx. – user3208239 Nov 05 '15 at 20:01

1 Answers1

0

Pure Windows batch is not very good at text processing.

I would use my JREPL.BAT regular expression text processing utility for this task. It is pure script (hybrid JScript/batch) that runs natively on any Windows machine from XP onward. Full documentation is available by executing jrepl /? from the command line.

Here is a solution using JREPL.BAT. Note that the search is a regular expression, so you will have to escape search characters in VALUE TO SEARCH that are regex meta characters. The command reads "test.txt" and writes the result to "out.txt"

jrepl "([^\r\n]+\r?\n)*.*VALUE TO SEARCH.*\n?([^\r\n]+\r?\n?)*" "$0" /jmatch /m /f test.txt /o out.txt

You must use CALL JREPL if you put the command within a batch script.

It is possible to solve this using pure batch, but it is complicated and much less efficient (much slower). Here is one solution.

@echo off
setlocal enableDelayedExpansion

set "infile=test.txt"
set "outfile=out.txt"
set "find=VALUE TO SEARCH"

set "emptyFile=empty.txt"

:: Compute end of file as number of lines + 1
for /f %%N in ('find /c /v "" ^<"%infile%"') do set /a last=%%N+1

:: Get list of line numbers of empty lines and append end of file
>"%emptyFile%" (
  for /f "delims=:" %%N in ('findstr /n "^$" "%infile%"') do echo %%N
  echo !last!
)

<"%infile%" >"%outFile%" (
  set /a last=1

  %= iterate list of found line numbers, ignoring lines that have already been printed =%
  for /f "delims=:" %%A in ('findstr /nc:"!find!" "!infile!"') do if %%A geq !last! (

    %= Locate beginning and end of found section, and compute lines to skip =%
    set /a beg=0
    set "end="
    for /f "usebackq" %%B in ("%emptyFile%") do if not defined end (
      if %%B gtr %%A (set /a end=%%B, start=beg+1, stop=end-1) else set beg=%%B
    )

    %= Skip lines until beginning of found section =%
    for /l %%N in (!last! 1 !beg!) do set /p "ln="

    %= Write empty line delimiter if not first found section =%
    if !last! gtr 1 (echo()

    %= Read and write found section =%
    for /l %%N in (!start! 1 !stop!) do (
      set /p "ln="
      (echo(!ln!)
    )

    set /a last=end
  )
)

del "%emptyFile%"

The pure batch solution above has the following limitations:

  • Lines must be <= 1021 bytes long
  • Control characters will be stripped from the end of each line
  • Each line must be terminated by \r\n (Windows style). It will not work with \n (Unix style).
dbenham
  • 651
  • 4
  • 12
  • Good morning, dbenham. First of all, thank you for your response. I implemented the code from your "Here is one solution" example, and it appears to run properly because I can see the number of lines successfully written to the emptyfile, but I'm left with an empty outfile file. In addition, the longest line in my infile is 221 characters and every single line is terminated with [CR][LF]. Any ideas? Thanx. – user3208239 Nov 06 '15 at 14:11
  • @user3208239 - I'm stumped. I tested the code, and it works perfectly for me. As currently written, the search is case sensitive. Not sure if that matters. – dbenham Nov 06 '15 at 14:46
  • Also, there could be issues if your search string contains the `!` character. That character would have to be escaped when defining `find`. – dbenham Nov 06 '15 at 14:49
  • Okay, my search string is "Java." For my search string, I was using lower case instead of the proper case, but didn't contain an exclamation point. Unfortunately, modifying the case didn't make a difference. As currently written, is the search string expecting a match for the entire line or can the search string be a string that is found anywhere within one line? Thanx again. – user3208239 Nov 06 '15 at 15:09
  • It should find the string anywhere. The search can ignore case if you change `findstr /nc` to `findstr /inc` – dbenham Nov 06 '15 at 16:40
  • Still nothing with adding the "i" to the findstr command. Just thought of something, would it matter if my infile contains non-alphanumeric characters other than an exclamation point? The non-alphanumeric characters are as follows: – user3208239 Nov 06 '15 at 16:47
  • square brackets, backslashes, at symbols, equal signs, double quote marks, colons, commas, periods, tilde symbols, parenthesis, dashes, plus signs, and curly brackets – user3208239 Nov 06 '15 at 16:54
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/31193/discussion-between-user3208239-and-dbenham). – user3208239 Nov 06 '15 at 16:56
  • Tried to initiate chat, but doesn't appear to be working properly. Thanx. – user3208239 Nov 09 '15 at 15:14