Batch filter duplicate lines and write to a new file (semi-finished)

Question

I have successfully made a script that filters out duplicate lines in a file and saves the results to a variable semi-colon separated (sort of an "array"). I could not find any real good solution to it.

@echo off
setlocal enabledelayedexpansion

rem test.txt contains:
rem 2007-01-01
rem 2007-01-01
rem 2007-01-01
rem 2008-12-12
rem 2007-01-01
rem 2009-06-06
rem ... and so on

set file=test.txt

for /f "Tokens=* Delims=" %%i in ('type %file%') do (
    set read=%%i
    set read-array=!read-array!;!read!
)

rem removes first trailing ";"
set read-array=!read-array:*;=!
echo !read-array!

for /f "Tokens=* Delims=" %%i in ('type %file%') do (
    set dupe=0
    rem searches array for the current read line (%%i) and if it does exist, it deletes ALL occurences of it
    echo !read-array! | find /i "%%i" >nul && set dupe=1
    if ["!dupe!"] EQU ["1"] (
        set read-array=!read-array:%%i;=!
        set read-array=!read-array:;%%i=!
    )
    rem searches array for the current read line (%%i) and if it does not exist, it adds it once
    echo !read-array! | find /i "%%i" >nul || set read-array=!read-array!;%%i
)

rem results: no duplicates
echo !read-array!

Contents of !read-array! is 2008-12-12;2007-01-01;2009-06-06

I now want to take out each item in the array and write them to a new file, with line breaks after each item. Example:

2008-12-12
2007-01-01
2009-06-06

So this is what I've come up with so far.

The problem I'm having is that the second for-loop doesn't accept the !loop! variable as a token definition when being nested. It does however accept %loop% if it's not nested. The reason I'm doing it this way is that the !read-array! may have a unknown number of items, therefore I count them as well. Any ideas?

rem count items in array
set c=0
for %%i in (!read-array!) do set /a c+=1

echo %c% items in array
for /l %%j in (1,1,%c%) do (
    set loop=%%j
    for /f "Tokens=!loop! Delims=;" %%i in ("!read-array!") do (
        echo %%i
        rem echo %%i>>%file%
    )
)
exit /b

score 2 · Accepted Answer · edited Nov 17 '11 at 08:51

2

At end of your first section, when contents of !read-array! is 2008-12-12;2007-01-01;2009-06-06, you may directly separate the elements of your "list" with a simple for because the standard separators in Batch files may be, besides spaces, comma, semicolon or equal signs:

for %%i in (%read-array%) do echo %%i

However, may I suggest you a simpler method?

Why not define a "real" array with the subscript value of the lines? This way, several repeated lines will store its value in the same array element. At end, just display the values of the resulting elements:

@echo off
set file=test.txt
for /F "Delims=" %%i in (%file%) do (
    set read-array[%%i]=%%i
)
rem del %file%
for /F "Tokens=2 Delims==" %%i in ('set read-array[') do (
    echo %%i
    rem echo %%i>>%file%
)

EDIT Alternative solution

There is another method that assemble a list of values separated by semicolon as you proposed. In this case each value is first removed from previous list content and immediately inserted again, so at end of the cycle each value is present just once.

@echo off
setlocal EnableDelayedExpansion
set file=test.txt
for /F "Delims=" %%i in (%file%) do (
    set read-array=!read-array:;%%i=!;%%i
)
rem del %file%
for %%i in (%read-array%) do (
    echo %%i
    rem echo %%i>> %file%
)

edited Nov 17 '11 at 08:51

Niklas J. MacDowall

673
13
27

answered Nov 15 '11 at 21:33

Aacini

65,180
12
72
108

Att end of your first section? Do you refer to both these lines? `for /f "Tokens=* Delims=" %%i in ('type %file%') do (` I will also try out your suggestion of using real arrays, I didn't know or used them before. In any case, I still would like to know why I'm unable to nest the second `for`-loop as I do in my case. – Niklas J. MacDowall Nov 16 '11 at 07:42
@Niklas: After contents of `!read-array!` is `2008-12-12;2007-01-01;2009-06-06`, to separate each item just use `for %%i in (%read-array%) do echo %%i`. And for the nested second `for`, you may eliminate the loop variable and write it this way: `for /f "Tokens=%%j Delims=;" %%i in ("!read-array!") do (`. I also included another Batch file that use your original method of semicolon separated list of values. – Aacini Nov 17 '11 at 00:28
I get `The syntax of the command is incorrect.` for this line `for /F "Tokens=2 Delims==" %%i in ('set read-array[') do` I don't know why. Your second proposal looks promising, I will try that. But your suggestion of the nested `for`-loop uses `%%j` as a token, isn't that supposed to be `%%i`, also will that really return values line by line from the semicolon separated variable? – Niklas J. MacDowall Nov 17 '11 at 07:09
1

Your both solutions works perfectly. I've however proposed an edit to your post to fix a typo which caused the syntax error. – Niklas J. MacDowall Nov 17 '11 at 08:31

Batch filter duplicate lines and write to a new file (semi-finished)

1 Answers1