2

I wrote a script that replaces the semicolon delimiter in a text file to commas, but it also puts quotes around the text of the 9th column since the 9th column contains commas. When I output to the new text file it process very slow, takes maybe 4-5 minutes, the text file that it is reading from is 50MB. Is there faster way or more efficient way to do this? Here is my FOR loop:

FOR /f "usebackq tokens=1-9* delims=;" %%a IN ("%FILENAME%") DO (
SET C10=%%j
ECHO(%%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,"%%i",!C10:;=,! >> "%MYPATH%\Filename %MMDDYYYY%.csv")

or should I just learn python.....

Thank you.

arealhobo
  • 447
  • 1
  • 6
  • 17
  • The FOR command reads in the entire file before it begins processing it. I personally use Dave Benham's Parse CSV. It is a hybrid script that should process the file faster. http://www.dostips.com/forum/viewtopic.php?t=5702 – Squashman Feb 17 '17 at 02:25
  • If you want speed, do not use Batch. – abelenky Feb 17 '17 at 03:31
  • What's wrong with using a text editor for editing text? For example Vim does it in a few seconds, depending mostly on the performance of the storage device on which the file resides. – AlexP Feb 17 '17 at 06:39
  • @AlexP: a text editor is fine (and the first choice) if you have to do it just once. If you have to do it on a regular basis, it sucks. – Stephan Feb 17 '17 at 06:47
  • Some text editors accept commands as arguments, you know... – AlexP Feb 17 '17 at 07:02
  • @AlexP and running an editor with arguments several (dozends? hundreds?) times day by day is better than a script - how? – Stephan Feb 17 '17 at 07:44
  • It is better than running a script dozens or hundreds of times a day. Editors are much more efficient at editing text than scripts. For example, [`sed`](http://gnuwin32.sourceforge.net/packages/sed.htm) is available for Win32. – AlexP Feb 17 '17 at 07:45
  • My aim is to automate a process without having to install external tools, and to have this text file formatted a specific times a day multiple times at day, so having to manually do this in a text editor is not convenient. – arealhobo Feb 17 '17 at 07:52
  • @Squashman - I hadn't thought of using [parseCSV.bat](http://www.dostips.com/forum/viewtopic.php?t=5702) that way, as it was not the original purpose. `type old.csv|parseCSV "/i;" >new.csv` will probably work for the OP. But it will fail if the original csv contains internal `"` that is escaped as `""`, because parseCSV will un-escape the quotes. – dbenham Feb 18 '17 at 14:02
  • @dbenham I know that was not your original purpose of the program but you built the functionality into it by allowing input and output delimiters. – Squashman Feb 18 '17 at 14:16
  • Would you be ok with using a VBscript that calls out to Excel? – Squashman Feb 18 '17 at 16:04
  • 1
    @Squashman - See my answer - I've taken your idea, and modified the utility so that it now can properly do the transformation, even if the input contains quote literals. – dbenham Feb 19 '17 at 05:48

3 Answers3

6

One thing that will make the script run "faster" is to avoid opening and closing the output file for each write operation

>> "%MYPATH%\Filename %MMDDYYYY%.csv" (
    FOR /f "usebackq tokens=1-9* delims=;" %%a IN ("%FILENAME%") DO (
        SET C10=%%j
        ECHO(%%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,"%%i",!C10:;=,!
    )
)

Instead of redirecting each line, redirect the full for command.

MC ND
  • 69,615
  • 8
  • 84
  • 126
  • I'm still a noob batch scripter, how would would I actually use this in the script, what goes before >> ? – arealhobo Feb 24 '17 at 01:42
  • @MPineda, code is a direct replacement of the code in your question. Before the `>>` you should have the same code you had, at least the delayed expansion enabled and initialization of the used variables. – MC ND Feb 24 '17 at 06:53
  • I see, just tried it and it worked flawlessly, took less than 5 seconds, thank you! – arealhobo Feb 24 '17 at 17:51
4

if you want to change the separators to correctly open the csv in Excel: there is no need to do that. There is a (undocumented) trick to tell Excel, what char to use as separator:

(echo Sep=;) > new.csv
type old.csv >> new.csv
move /y new.csv old.csv

Note: Excel reads the Sep=;, and imports the rest of the file with that separator, but it's not part of the spreadsheed, so if you save again with Excel, the Sep=; is lost (Excel uses , or ; as separator, depending on it's install language).

Stephan
  • 53,940
  • 10
  • 58
  • 91
  • Ooh, I'll have to try and remember this one. Could come in handy. – dbenham Feb 18 '17 at 13:36
  • Hmm. I knew about the sep option but I did not think it could change the delimiter of the actual data. I normally use it to specify that the delimiter is a pipe so that when I open a file with a csv extension, Excel knows to use the pipe as the delimited. – Squashman Feb 18 '17 at 14:11
  • @Squashman - Stephan is proposing using it exactly as you describe. He is postulating that the file delimiter need not be physically changed if the OP simply wants to open the file in Excel. True, this answer does not answer the actual question. But one of the most common reasons why people think they must reformat their data is so that it can be opened in Excel, and this is a good alternative to reformatting the data. The OP did not state the purpose of the reformat, so this answer may or may not be relevant to the OP. – dbenham Feb 18 '17 at 17:14
  • @dbenham: definitively handy - we use it a lot (different localizations). But remember: as far as I know, this is not part of `csv` definition. I don't know if other spread sheet programs will handle it and even Microsoft could decide to not support it in a future version of Excel. – Stephan Feb 20 '17 at 16:47
  • What is "sep", how do I actually implement this into my script? Is "old.csv" my file? – arealhobo Feb 24 '17 at 01:45
  • Nevermind I figured it out, I see it adds Sep=; to the first line of the file, my guess is, excel reads that first line to know what to treat as a delimiter? I will have to see if the system I submit this file to will skip over that first line. – arealhobo Feb 24 '17 at 17:31
1

Squashman had a good idea to use my parseCSV.bat utility to do the conversion. The utility is a hybrid script that uses both batch and JScript. This utility is much faster than any pure batch solution, and it runs natively on any Windows machine from XP onward - no 3rd party exe is required.

The utility was actually produced to allow convenient parsing of CSV files via FOR /F within batch files. Version 1.0 had one problem which made it less than ideal for using the utility to transform CSV formats for other uses - Escaped quote literals were unescaped, so that "" becomes ". This transformation is useful for FOR /F parsing, but it is not valid CSV format.

I modified the utility to have an option to preserve the escape of quote literals. Now you can safely use the following to quickly convert from semicolon to comma delimiters.

parseCSV /I:; /Q:E <input.csv >output.csv

Because parseCSV is itself a batch script, you must use call parseCSV if you use the command within another batch script.

I was able to transform a 53MB file with parseCSV.bat in 2.5 minutes.

Here is the code for parseCSV. However, I do not promise to keep this code up-to-date. I recommend you get the code from the DosTips post. There you will also find a good description of the other features of the utility.

@if (@X)==(@Y) @end /* harmless hybrid line that begins a JScrpt comment

::************ Documentation ***********
::parseCSV.bat version 1.2
:::
:::parseCSV  [/option]...
:::
:::  Parse stdin as CSV and write it to stdout in a way that can be safely
:::  parsed by FOR /F. All columns will be enclosed by quotes so that empty
:::  columns may be preserved. It also supports delimiters, newlines, and
:::  escaped quotes within quoted values. Two consecutive quotes within a
:::  quoted value are converted into one quote by default.
:::
:::  Available options:
:::
:::    /I:string = Input delimiter. Default is a comma (,)
:::
:::    /O:string = Output delimiter. Default is a comma (,)
:::
:::         The entire option must be quoted if specifying poison character
:::         or whitespace literals as a delimiters for /I or /O.
:::
:::         Examples:  pipe = "/I:|"
:::                   space = "/I: "
:::
:::         Standard JScript escape sequences can also be used.
:::
:::         Examples:       tab = /I:\t  or  /I:\x09
:::                   backslash = /I:\\
:::
:::    /E = Encode output delimiter literal within value as \D
:::         Encode newline within value as \N
:::         Encode backslash within value as \S
:::
:::    /D = escape exclamation point and caret for Delayed expansion
:::         ! becomes ^!
:::         ^ becomes ^^
:::
:::    /L = treat all input quotes as quote Literals
:::
:::    /Q:QuoteOutputFormat
:::
:::       Controls output of Quotes, where QuoteOutputFormat may be any
:::       one of the following:
:::
:::         L = all columns quoted, quote Literals output as "   (Default)
:::         E = all columns quoted, quote literals Escaped as ""
:::         N = No columns quoted, quote literals output as "
:::
:::       The /Q:E and /Q:N options are useful for transforming data for
:::       purposes other than parsing by FOR /F
:::
:::    /U = Write unix style lines with newline (\n) instead of the default
:::         Windows style of carriage return and linefeed (\r\n).
:::
:::parseCSV  /?
:::
:::  Display this help
:::
:::parseCSV  /V
:::
:::  Display the version of parseCSV.bat
:::
:::parseCSV.bat was written by Dave Benham. Updates are available at the original
:::posting site: http://www.dostips.com/forum/viewtopic.php?f=3&t=5702
:::

::************ Batch portion ***********
@echo off
if "%~1" equ "/?" (
  setlocal disableDelayedExpansion
  for /f "delims=: tokens=*" %%A in ('findstr "^:::" "%~f0"') do echo(%%A
  exit /b 0
)
if /i "%~1" equ "/V" (
  for /f "delims=:" %%A in ('findstr /bc:"::%~nx0 version " "%~f0"') do echo %%A
  exit /b 0
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0


************ JScript portion ***********/
var args     = WScript.Arguments.Named,
    stdin    = WScript.Stdin,
    stdout   = WScript.Stdout,
    escape   = args.Exists("E"),
    literalQ = args.Exists("L"),
    escapeQ  = (args.Item("Q")&&args.Item("Q").toUpperCase()=="E"),
    quoteCol = (args.Item("Q")&&args.Item("Q").toUpperCase()=="N") ? '' : '"',
    delayed  = args.Exists("D"),
    inDelim  = args.Item("I") ? eval('"'+args.Item("I")+'"') : ",",
    outDelim = args.Item("O") ? eval('"'+args.Item("O")+'"') : ",",
    newline  = args.Exists("U") ? "\n" : "\r\n",
    quote    = false,
    ln, c, n, out;
while (!stdin.AtEndOfStream) {
  ln=stdin.ReadLine();
  out="";
  if (!quote) stdout.Write(quoteCol);
  for (n=0; n<ln.length; n++ ) {
    c=ln.charAt(n);
    if (c == '"') {
      if (literalQ) {
        if (escapeQ) c+='"';
      } else if (quote && ln.charAt(n+1) == '"') {
        n++;
        if (escapeQ) c+='"';
      } else {
        quote=!quote;
        continue;
      }
    }
    else if (c == inDelim && !quote) c=quoteCol+outDelim+quoteCol;
    else if (escape) {
      if (c == outDelim) c="\\D";
      if (c == "\\") c="\\S";
    }
    else if (delayed) {
      if (c == "!") c="^!";
      if (c == "^") c="^^";
    }
    out+=c;
  }
  out += (quote) ? ((escape) ? "\\N" : newline) : quoteCol+newline;
  stdout.Write(out);
}
dbenham
  • 127,446
  • 28
  • 251
  • 390