0

I'm trying to convert the eBay File Exchange download into a tab-delimited format my shipping software can read.

If each and every column were quoted, this would be easy--but they're not. Only some columns (name, item listing title, etc) are quoted, and some quoted columns contain commas. The rest are bare of quotes.

I need a way to parse and convert this in a .bat file, but using comma as a delimiter splits the quoted fields if they contain a comma too, giving me unusable data. I'm certain there's a simple fix for this, I just can't figure it out.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278

2 Answers2

2

Eric J is correct - solving this kind of problem with batch is not simple. But it is possible :-)

The main problem is how to differentiate between quoted and unquoted commas - jeb solved a similar problem with quoted vs. unquoted semicolons at 'Pretty print' windows %PATH% variable - how to split on ';' in CMD shell. The code below looks very different, but the fundamental concept is the same.

The code below should work for pretty much any CSV as long as all lines are less than ~8000 bytes long. Batch variable values are limited to 8191 bytes, and some characters are temporarily expanded to two bytes.

The code assumes there are not any existing TABs within the CSV file.

It does not modify any existing quotes.

As I say, the code should work, but it will be painfully SLOW if you have a large file. You would be much better off with a .NET solution as Eric J suggested.

@echo off
setlocal disableDelayedExpansion

set "file=optionalPathinfo\yourFile.csv"

:: Define a TAB variable
for /f "delims=" %%A in (
  'forfiles /p "%~dp0." /m "%~nx0" /c "cmd /c echo(0x09"'
) do set "TAB=%%A"


:: Read each line from CSV, convert it, and write to new file with .new extension
>"%file%.new" (
  for /f usebackq^ delims^=^ eol^= %%A in ("%file%") do (
    set "line=%%A"
    call :processLine
  )
)
exit /b


:processLine
setlocal enableDelayedExpansion

:: Protect problem characters
set "line=!line:@=@A!"
set "line=!line:^=@K!"
set "line=!line:&=@M!"
set "line=!line:|=@P!"
set "line=!line:<=@L!"
set "line=!line:>=@G!"

:: Mark commas with leading caret (escape)
set "line=!line:,=^,!"

:: Remove mark from unquoted commas, but first temporarily
:: disable delayed expansion to protect any ! characters
setlocal disableDelayedExpansion
set ^"line=%line%"
setlocal enableDelayedExpansion

:: Protect remaining marked commas
set "line=!line:^,=@C!"

:: Convert remaining commas to TAB
set "line=!line:,=%TAB%!"

:: Restore protected characters
set "line=!line:@C=,!"
set "line=!line:@G=>!"
set "line=!line:@L=<!"
set "line=!line:@P=|!"
set "line=!line:@M=&!"
set "line=!line:@K=^!"
set "line=!line:@A=@!"

:: Write modified line
echo(!line!
exit /b
Community
  • 1
  • 1
dbenham
  • 127,446
  • 28
  • 251
  • 390
  • The file is never more than two weeks of orders--so 200 or less lines. I'll give this a shot, and report back. – Edwin Rogriguez Dec 19 '13 at 15:35
  • Okay, that works. It takes a while, as you said, but it does the trick. Now I just have to combine it with the standard start-of-day conversion. – Edwin Rogriguez Dec 19 '13 at 17:30
  • I have come across issues of exhausting "setlocal" when in a loop. Although this code has no loop but a "call", when you do "setlocal disableDelayedExpansion" then a matching endlocal should done to end the localization. Just saying, as casual coders may develop misunderstandings of coding. Would you agree? Does "exit /b" do an implicit clear of all issued "setlocal" commands ? Why do you use "(" instead of space char in commands "echo(0x09" and "echo(!line!". Does "(" do something? – Skip R Jan 01 '15 at 12:05
  • 1
    `ENDLOCAL` is not needed because yes, `EXIT /B` (actually, any form of exiting the script or routine) does an implicit `ENDLOCAL` for all active `SETLOCAL` issued within the routine. `ECHO(` prevents the "ECHO is off." message from occurring if the following text is empty or nothing but white space. `ECHO.` is more commonly used, but there are obscure scenarios where that can fail. `ECHO(` is the only form that is guaranteed to work in all scenarios. – dbenham Jan 01 '15 at 14:11
  • Always good to learn something new. `:-)` Just one more question why is caret escape on first double quote in `set ^"line=%line%"` but not on second double quote as well? – Skip R Jan 01 '15 at 14:33
  • @SkipR - Just in case there are an odd number of quotes in the value of `%line%`, which would turn the `^` into a literal character. The quotes are only there to prevent any trailing white space that might accidentally be there from being included in the assignment - all text starting with the last `"` (escaped or not) is ignored if there is a `"` before the variable name. – dbenham Jan 01 '15 at 16:11
1

There's a further complication: A field with a quote and a comma will also have the quote escaped:

Jim "Smitty" Smith, Jr.

would be represented in the CSV file as

"Jim ""Smitty"" Smith, Jr."

This is not the kind of problem that is easily solved in a batch file. However, there is preexisting functionality to deal with the CSV format that can be used from any .NET compatible language including Powershell. If that is an option, have a look at

http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

For information on calling the .NET methods to read CSV files from Powershell, have a look at

http://blogs.msdn.com/b/mattbie/archive/2010/02/23/how-to-call-net-and-win32-methods-from-powershell-and-your-troubleshooting-packs.aspx

Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • +1, I agree that batch is not a good choice for this problem. But it can be done :-) See my answer. – dbenham Dec 19 '13 at 05:28