2

I searched extensively and found what I believe to be a solution to my problem, which is merging CSV files without duplicating headers each time. It looks like it works, except it's only copying the first file in the folder into the destination file. I think it's unable to open the files because they have a space in the name. I've been advised I probably just need to put quotes somewhere, but I'm not sure where they would go. Thanks in advance.

@ECHO OFF
SET first=y
SET newfile=new.csv
for %%F in (*.csv) do IF NOT %%F==%newfile% (
  if defined first (
    COPY /y "%%F" %newfile% >nul
    set "first="
  ) else (
    FOR /f "skip=1delims=" %%i IN (%%F) DO >> %newfile% ECHO %%i
  )
)
user3229528
  • 29
  • 1
  • 4
  • 8

5 Answers5

3
@echo off
    setlocal enableextensions disabledelayedexpansion

    rem configure paths
    set "source=*.csv"
    set "target=newfile.csv"

    rem remove output file if needed
    if exist "%target%" del "%target%" >nul 2>nul

    rem search for header row
    set "headerRow="
    for %%f in ("%source%") do (
        <"%%~ff" ( for /l %%a in (1 1 10) do if not defined headerRow set /p "headerRow=" )
        if defined headerRow goto haveHeader
    )
:haveHeader
    if not defined headerRow (
        echo ERROR: impossible to get header row. 
        goto endProcess
    )

    rem output header to header file to use as filter.
    rem header is cut to avoid findstr limitations on search strings
    set "headerFile=%temp%\%~nx0_headerFile.tmp"
    setlocal enableextensions enabledelayedexpansion
    > "%headerFile%" echo(!headerRow:~0,125!
    endlocal


    rem search for input files with matching headers to join to final file
    for /f "tokens=*" %%f in ('findstr /m /b /l /g:"%headerFile%" "%source%"') do (
        if not exist "%target%" (

                rem first file is directly copied
                copy "%%~f" "%target%" /y > nul 2>nul

            ) else (

                rem next files are filtered to exclude the header row
                findstr /v /b /l /g:"%headerFile%" "%%~f" >> "%target%"
        )
        echo ... [%%~ff] joined to %target%
    )

    rem remove the temporary header file
    del "%headerFile%" >nul 2>nul

:endProcess
    endlocal
MC ND
  • 69,615
  • 8
  • 84
  • 126
  • This seems to work, but regardless of what program I use to try and open the file, it freezes up and crashes when I try to scroll down. It can't be because of a huge file size or anything because I can open other text/csv files that are larger and its fine. Any idea? – user3229528 Jan 24 '14 at 13:24
  • @user3229528, I've tried joining monthly 160MB partial files to get full year without problems. Maybe some kind of problem with the encoding of files. Have you tried with a simply `more` or one `type` for a fast preview? – MC ND Jan 24 '14 at 15:25
  • I didn't realize at first when I posted this question, because someone just handed me these files and it didn't really cross my mind to check.. but they are actually unicode. Also FYI 'more' tends to get stuck on large files for sure, I've experienced it and read about several others running into it as well. – user3229528 Jan 24 '14 at 18:19
  • @user3229528: Default cmd file output is ansi, not unicode. This should allow you to do `for %f in (*.log) do type "%~f" > "%~nf.txt%~xf"` and get the files in ansi (and half the size). Of course, check you are not loosing anything important in the conversion. – MC ND Jan 24 '14 at 18:27
  • This works if I do it from a cmd prompt, but if I put it into a batch file it doesn't work. Can you tell me why this is? – user3229528 Jan 24 '14 at 19:06
  • @user3229528, `for` commands inside batch files need the replaceable parameter with the percent doubled. For a batch file use `for %%f in (*.log) do type "%%~f" > "%%~nf.txt%%~xf"` – MC ND Jan 24 '14 at 19:09
  • Oh right. So, I have that working, but now I can't get the original answer you posted to work. It says "'Account_Number' is not recognized as an internal or external command, operable program or batch file." Account_Number is the 2nd header name in the files. – user3229528 Jan 24 '14 at 19:19
  • @user3229528, answer edited. Are your CSV files fields separated by `|`? I think the fail was in the echo line that creates the header filter file. Changed to delayed expansion sintax to avoid cmd parser interpreting special characters in variable. – MC ND Jan 25 '14 at 07:34
1
@echo off &setlocal disableDelayedExpansion
set "NewFile=new.csv"

>"%NewFile%" cd .
for /f "tokens=1*delims=:" %%a in ('dir /b /a-d /od *.csv ^|findstr /nvx "%NewFile%"') do (
    if %%a equ 1 (
        copy /b "%%~b" "%NewFile%" >nul
    ) else (
        for /f "skip=1delims=" %%c in ('type "%%~b"') do >>"%NewFile%" echo(%%c
    )
)

sed for Windows

Endoro
  • 37,015
  • 8
  • 50
  • 63
  • I tried something similar to this earlier today, and it seems to work on smaller files. However, when I have larger files around 15-20mb it gets stuck on the more +1 part I believe. Someone else had this same problem. – user3229528 Jan 23 '14 at 22:22
  • 'sed' is not recognized as an internal or external command, operable program or batch file. – user3229528 Jan 23 '14 at 22:51
  • 1
    Okay, so I got sed installed and working. This code appears to work, except that about 1/4 of the way through the destination file, every other line starts to be chinese looking characters. Like this: 䌀䬀簀㐀 㠀㘀㤀簀㄀㔀㄀簀㠀㄀簀␀㌀⸀㄀㠀簀䌀䬀䌀䐀 䐀䔀䈀䤀吀      ㄀㈀⼀ 㘀   㨀     倀䔀一一 匀 䌀䄀吀䘀䤀匀䠀 䌀䠀䤀䌀䬀䔀一 䴀䔀刀䤀 䴀匀簀㄀㈀⼀㤀⼀㈀ ㄀㌀ഀഀ – user3229528 Jan 24 '14 at 04:20
  • Is there any way to make it work without converting the text? – user3229528 Jan 24 '14 at 04:43
  • This just produces an empty new.csv file – user3229528 Jan 24 '14 at 04:52
  • no, tested it, works fine. If you can you can upload some text files for testing, but _only packed_ (zip or rar). – Endoro Jan 24 '14 at 05:05
  • Ok, not sure what I did the first time. Now it's getting hung up much like the "more" solution from earlier I guess because of large files? – user3229528 Jan 24 '14 at 05:11
  • yes, might be. But also might be, you need more patience :) please upload some packed test files. – Endoro Jan 24 '14 at 05:32
  • I don't think it's a patience issue. The destination file takes on the exact file size of the last CSV in the folder and then just sits there forever. I can't really upload the files I'm working with because they contain account numbers, SSNs, etc. =/ – user3229528 Jan 24 '14 at 05:35
  • To convert the text you can try [iconv](http://gnuwin32.sourceforge.net/packages/libiconv.htm): `iconv -f UTF-16 -t CP850 infile.txt > outfile.txt` – Endoro Jan 24 '14 at 06:11
1

Here's another option.

@echo off
set "newfile=new.txt"
del "%newfile%" 2>nul
for %%a in (*.csv) do (
  if not exist "%newfile%" (type "%%a" > "%newfile%") else (more +1 "%%a" >> "%newfile%")
)
ren "%newfile%" "new.csv"
foxidrive
  • 40,353
  • 10
  • 53
  • 68
0

I think the line near the end starting "FOR /f" is mixed up and it should be:

@ECHO OFF
SET first=y
SET newfile=new.csv
for %%F in (*.csv) do IF NOT %%F==%newfile% (
  if "%first%"=="y" (
    COPY /y "%%F" %newfile% >nul
    set "first="
  ) else (
    FOR /f "skip=1delims=" %%i IN ("%%F") DO ECHO %%i >> %newfile%
  )
)
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
0
@ECHO OFF
SET first=y
SET "newfile=new.txt"
del new.csv 2>nul >nul
for %%F in (*.csv) do (
  if defined first (
    COPY /y "%%F" %newfile% >nul
    set "first="
  ) else (
    FOR /f "usebackqskip=1delims=" %%i IN ("%%F") DO >> %newfile% ECHO %%i
  )
)
ren %newfile% new.csv

The set "var=value" syntax ensures that any trailing spaces on the batch line are not included in the value assigned to var.

First step is to delete the new.csv file - the 2>nul >nul redirects messages and error messages from del so that the command is totally silent - whether the file exists or not.

Next, you don't need to check whether the new.csv is selected as %%F because it's just been deleed if it did exist, and the output is now to new.txt (filename not critical - actually, I'd be tempted to call it new.vsc. The critical thing is that it isn't .csv so for doesn't need to check it)

Other than the first file (a copy is faster than reading and echoing), the name of the file (in %%F) being read into %%i, since it needs to be "quoted" (to tell CMD that the spaces are not separators) you need to add the usebackq to the for/f controls.

Finally, rename your file to the desired new name.

This should fix the problem.

Magoo
  • 77,302
  • 8
  • 62
  • 84
  • 1
    This still isn't working for me. The resulting csv file contains only the data from the first csv in the folder. The other 7 files are not included. – user3229528 Jan 24 '14 at 03:15
  • Works perfectly for me. Did you cut-and-paste or try retyping it? – Magoo Jan 25 '14 at 07:49