A Perl regular expression for finding and selecting the field value in third data column in the CSV file is:
^[^\r\n,]*,(?:"[^"]*")+,\K[^\r\n,]*
This search expression works only under the following conditions:
- The first field value contains never a comma and is not enclosed in
"
for that reason too.
- The second field value must be enclosed in
"
and can be therefore a string containing a comma as well as newline characters and also "
on being correct encoded as described by the Wikipedia article about comma-separated values format.
- The third field value must be without a comma and is not enclosed in
"
for that reason too.
Explanation of the regular expression in Perl syntax:
^
instructs the Perl regular expression engine to start each search at beginning of a line.
[^\r\n,]*
is a negative class definition to find any character NOT being a carriage return or a line-feed or a comma zero or more times. That expression matches the first comma separated value. There could be used also [0-9]+
for the given example.
,
must be found next (and not a newline character) being the first separator in the data row.
(?:
...)
is a non-marking / non-capturing group to define an expression on which the multiplier +
is applied for finding strings matching the expression inside the group one or more times.
"[^"]*"
is for finding strings beginning with "
and ending with "
and consisting of zero or more characters NOT being the character "
which includes also newline characters like carriage return and line-feed. This expression applied one or more times matches the string value of second field value in each data row.
,
must be found next being the second separator in the data row.
\K
instructs the Perl regular expression engine to keep back the already matched string which results in getting this part of the found string not selected in UltraEdit and ignored for the replace operation. If the replace should modify the first or second field value in each data row in any way, remove \K
from the search expression.
[^\r\n,]*
is again a negative class definition to find any character NOT being a carriage return or a line-feed or a comma zero or more times. That last part of the expression matches and selects the third field value in each data row on not being a string value which could contain a comma, carriage return, line-feed or double quote which would require enclosing the field value in "
for valid CSV format.
It is also possible with a CSV file not containing newline characters inside a field value clicking in UltraEdit on third ribbon tab Edit in fourth group Column / block on command item CSV convert or in contemporary menu Edit in submenu CSV convert on first menu item Convert to fixed-width or in traditional menu Column on menu item Convert to fixed column for opening the dialog window Convert to Fixed Columns.
In this dialog window should be unchecked the first option Scan first line only (vs. complete file).
The second option Keep delimiter with fixed columns can be checked or unchecked as preferred by the user.
The third option Ignore separator in quotes (') should be unchecked in this case with CSV file using correct "
to enclose a value with a comma, a double quote or a newline character in double quotes.
The fourth option Ignore separator in doubles quotes (") must be checked for this CSV file.
The fifth option Separator character (^t for tab): should be configured with ,
as separator.
A click on button Scan fills the last option Field widths (separated by comma ',') and a click on button Convert makes the conversion from comma-separated to fixed column width format.
Now replacements on field values can be done much easier by using either a Perl regular expression replace with ^.{x}
to match x
number of characters from the beginning of each line or there are used the advanced replace options In column x
to y
to do the replace only in the specified character column range on each line in the file. Note: The first character column has the column number 0 in the replace window. In the status bar is the column number shown with value 1 for the first character column.
The file can be converted back to CSV file with the command Convert fixed-width to CSV (ribbon mode and toolbar/menu mode with contemporary menus) respectively Convert to character delimited (toolbar/menu mode with traditional menus).
There can be used also an UltraEdit script for true CSV file parsing and do the modification with the script code.
Examples are: