1

I need to convert a dataset from a key value paired list (informix dbaccess output) into a columned csv. I'm fairly certain this can be done easily with awk or sed.

UPDATE The solution needs to be a single line response. I am using NSH (which is based on ZSH). So some of the typical "bashy" commands will not work.

Here is my data sample set:

part_no            100000001
date_part          2010-10-13 12:12:12
history_code       ABCD
user_id            rsmith
other_information   note: Monday, December 10
pool_no            101011777

part_no            100000002
date_part          2010-10-21 12:12:12
history_code       GHIJ
user_id            jsmith
other_information
pool_no            101011888

part_no            100000002
date_part          2010-10-27 12:12:12
history_code       LMNO
user_id            fevers
other_information   [Mail]
pool_no            101011999

part_no            100000003
date_part          2010-11-13 12:12:12
history_code       QXRT
user_id            sjohnson
other_information   note: Tuesday, August 31
pool_no            101011111

I need it to look like this:

part_no,date_part,history_code,user_id,other_information,pool_no
100000001,10/13/2010 12:12:12,ABCD,rsmith,note: Monday, December 10,101011777
100000002,10/21/2010 12:12:12,GHIJ,jsmith,,101011888
100000002,10/27/2010 12:12:12,LMNO,fevers,[Mail],101011999
100000003,11/13/2010 12:12:12,QXRT,sjohnson,note: Tuesday, August 31,101011111
27560
  • 93
  • 7
  • 1
    Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself. – Cyrus Jun 29 '18 at 15:28
  • 1
    Does the date format have to change from `2010-10-13` to `10/13/2010`? Also, there are commas within fields, but the suggested output doesn't use the convention to double quote fields and is thus ambiguous. – Benjamin W. Jun 29 '18 at 15:34
  • Data format does not have to change, and the output can use the double quote fields (commas will be present in the data) – 27560 Jun 29 '18 at 15:42
  • 1
    Update your question to show **the** expected output plus what you've tried so far and do it quick before your question gets closed as unclear and/or you get any more crazy answers with 20 sed commands, cats, cuts, pipes and the batman symbol. – Ed Morton Jun 29 '18 at 15:54
  • Can we rely on all the keys being in the same order for each record? – glenn jackman Jun 29 '18 at 16:06
  • @EdMorton I'm trying each of the suggestions as they come in. I need the command to be a single line solution (not multiple lines). I will update my question accordingly. Also, FYI - this is my 1st stack overflow post – 27560 Jun 29 '18 at 16:13
  • @glennjackman I can sort the output from informix dbaccess, so yes, I think – 27560 Jun 29 '18 at 16:14
  • Note that newlines are acceptable within a quoted string, so that a one-liner awk command can still have quoted newlines in it. I don't know anything about NSH, but if it didn't allow that, I'd be shocked disappointed. – glenn jackman Jun 29 '18 at 16:28
  • @glennjackman Can you format the output to be one line? I'm not following what you mean – 27560 Jun 29 '18 at 16:39
  • I mean that RavinderSingh13's first snippet of code is an awk one-liner. Do the tools you have to use prevent such a (readable) command? – glenn jackman Jun 29 '18 at 16:44
  • 1
    You create 'single line' solutions by creating a shell script that does the job, and running the shell script from your 'single line' system. Use an absolute path name if necessary. Otherwise, you're fighting with one hand tied behind your back (actually, both hands tied behind your back). – Jonathan Leffler Jun 29 '18 at 20:32

5 Answers5

4

Your question isn't clear but this MAY be what you're looking for:

$ cat tst.awk
BEGIN { RS=""; FS="\n"; OFS=","; ofmt="\"%s\"%s" }
{
   for (i=1; i<=NF; i++) {
       tag = val = $i
       sub(/[[:space:]].*/,"",tag)
       sub(/[^[:space:]]+[[:space:]]+/,"",val)
       tags[i] = tag
       vals[i] = val
    }
}
NR==1 {
    for (i=1; i<=NF; i++) {
        printf ofmt, tags[i], (i<NF ? OFS : ORS)
    }
}
{
    for (i=1; i<=NF; i++) {
        printf ofmt, vals[i], (i<NF ? OFS : ORS)
    }
}

$ awk -f tst.awk file
"part_no","date_part","history_code","user_id","other_information","pool_no"
"100000001","2010-10-13 12:12:12","ABCD","rsmith","note: Monday, December 10","101011777"
"100000002","2010-10-21 12:12:12","GHIJ","jsmith","other_information","101011888"
"100000002","2010-10-27 12:12:12","LMNO","fevers","[Mail]","101011999"
"100000003","2010-11-13 12:12:12","QXRT","sjohnson","note: Tuesday, August 31","101011111"
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Can your awk statement be run from a pipe (on a single line)? I am using this as part of an "extended object" in BMC Server Automation. – 27560 Jun 29 '18 at 16:10
  • Of course. Just replace every newline with a `;` and call it as `whatever | awk 'script'`. I have no idea what `an "extended object" in BMC Server Automation` is of course. – Ed Morton Jun 29 '18 at 18:21
  • 1
    This worked beautifully! Thank you! `cat dbaccessoutput | awk 'BEGIN { RS=""; FS="\n"; OFS=","; ofmt="\"%s\"%s" }; {; for (i=1; i<=NF; i++) {; tag = val = $i; sub(/[[:space:]].*/,"",tag); sub(/[^[:space:]]+[[:space:]]+/,"",val); tags[i] = tag; vals[i] = val; }; }; NR==1 {; for (i=1; i<=NF; i++) {; printf ofmt, tags[i], (i – 27560 Jul 03 '18 at 14:08
  • 1
    You're welcome. You don't need to cat the file to awk, though, awk is perfectly capable of opening a file on it's own. Google UUOC to understand the issue and use `awk 'script' file` rather than `cat file | awk 'script'`. – Ed Morton Jul 03 '18 at 14:30
  • I was just catting the file for testing, in the EO (extended object) I am performing my dbaccess command, doing some greps, then piping to the awk statement – 27560 Jul 03 '18 at 17:29
1

I'm tackling this as an Informix question rather than an Awk question.

Using the standard Informix SQL commands, you could create an external table in CSV format, too — but you have to know that there is an undocumented format "DB2" that you can use:

DROP TABLE IF EXISTS data_table;

CREATE TABLE data_table
(
        part_no            INTEGER,
        date_part          DATETIME YEAR TO SECOND,
        history_code       VARCHAR(4),
        user_id            VARCHAR(32),
        other_information  VARCHAR(64),
        pool_no            INTEGER
);

INSERT INTO data_table VALUES(100000001, "2010-10-13 12:12:12", "ABCD", "rsmith", "note: Monday, December 10", 101011777);
INSERT INTO data_table VALUES(100000002, "2010-10-21 12:12:12", "GHIJ", "jsmith", NULL, 101011888);
INSERT INTO data_table VALUES(100000002, "2010-10-27 12:12:12", "LMNO", "fevers", "[Mail]", 101011999);
INSERT INTO data_table VALUES(100000003, "2010-11-13 12:12:12", "QXRT", "sjohnson", "note: Tuesday, August 31", 101011111);

DROP TABLE IF EXISTS csv_data;
CREATE EXTERNAL TABLE csv_data
(
    part_no            INTEGER,
    date_part          DATETIME YEAR TO SECOND,
    history_code       VARCHAR(4),
    user_id            VARCHAR(32),
    other_information  VARCHAR(64),
    pool_no            INTEGER
)
USING (FORMAT "DB2", DELIMITER ",", DATAFILES("DISK:/tmp/data/csv_data.csv"));

INSERT INTO csv_data
        SELECT part_no, date_part, history_code, user_id, other_information, pool_no
          FROM data_table;

The content of /tmp/data/csv_data.csv then looks like:

100000001,2010-10-13 12:12:12,"ABCD","rsmith","note: Monday, December 10",101011777
100000002,2010-10-21 12:12:12,"GHIJ","jsmith",,101011888
100000002,2010-10-27 12:12:12,"LMNO","fevers","[Mail]",101011999
100000003,2010-11-13 12:12:12,"QXRT","sjohnson","note: Tuesday, August 31",101011111

UNLOAD format converted to CSV

The default output from DB-Access is not readily amenable to parsing in practice. It may be doable in some limited cases, such as the one you show, but you'd do better using the UNLOAD format instead of the command-line output, and then converting the UNLOAD data format to CSV.

I have a Perl script that does that. It uses the Perl Text::CSV module to handle CSV formatting. It doesn't pretend to handle the first line with column names; those aren't present in the UNLOAD format file.

#!/usr/bin/env perl
#
# @(#)$Id: unl2csv.pl,v 1.3 2018/06/29 20:36:58 jleffler Exp $
#
# Convert Informix UNLOAD format to CSV

use strict;
use warnings;
use Text::CSV;
use IO::Wrap;

my $csv = new Text::CSV({ binary => 1 }) or die "Failed to create CSV handle ($!)";
my $dlm = defined $ENV{DBDELIMITER} ? $ENV{DBDELIMITER} : "|";
my $out = wraphandle(\*STDOUT);
my $rgx = qr/((?:[^$dlm]|(?:\\.))*)$dlm/sm;

# $csv->eol("\r\n");

while (my $line = <>)
{
    print "1: $line";
    MultiLine:
    while ($line eq "\\\n" || $line =~ m/[^\\](?:\\\\)*\\$/)
    {
        my $extra = <>;
        last MultiLine unless defined $extra;
        $line .= $extra;
    }
    my @fields = split_unload($line);
    $csv->print($out, \@fields);
}

sub split_unload
{
    my($line) = @_;
    my @fields;
    print "$line";

    while ($line =~ $rgx)
    {
        printf "%d: %s\n", scalar(@fields), $1;
        push @fields, $1;
    }
    return @fields;
}

__END__

=head1 NAME

unl2csv - Convert Informix UNLOAD to CSV format

=head1 SYNOPSIS

unl2csv [file ...]

=head1 DESCRIPTION

The unl2csv program converts a file from Informix UNLOAD file format to
the corresponding CSV (comma separated values) format.

The input delimiter is determined by the environment variable
DBDELIMITER, and defaults to the pipe symbol "|".
It is not assumed that each input line is terminated with a delimiter
(there are two variants of the UNLOAD format, one with and one without
the final delimiter).

=head1 EXAMPLES

Input:

  10|12|excessive|cost \|of, living|
  20|40|bou\\ncing tigger|grrrrrrrr|

Output:

  10,12,"excessive","cost |of, living"
  20,40,"bou\ncing tigger",grrrrrrrr

=head1 PRE-REQUISITES

Text::CSV_XS

=head1 AUTHOR

Jonathan Leffler <jonathan.leffler@hcl.com>

=cut

You would use a command such as this (via DB-Access):

UNLOAD TO "datatable.unl" SELECT * FROM DataTable;

and then run:

perl unl2csv datatable.unl > datatable.csv

SQLCMD program

If you have my SQLCMD program (available from the IIUG web site in the software repository — and wholly unrelated to Microsoft's johnny-come-lately with the same name), then you can unload direct to CSV format:

sqlcmd -d database -F csv -e 'unload to "data_table.csv" select * from data_table'
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
0

Try this:

cat $file | cut -d ' ' -f 2- | sed 's/^[ \t]*//' | sed 's/$/,/' \
| xargs  | sed 's/ , /\n/g' | sed 's/.$//' | sed 's/, /,/g' \
| sed '1ipart_no,date_part,history_code,user_id,other_information,pool_no'
  • I should have mentioned that I am using NSH (network shell) which is based off of ZSH. Here is the output I received: sed: : No such file or directory nsh: command not found: xargs sed: : No such file or directory sed: 1: "1ipart_no,date_part,his ...": command i expects \ followed by text – 27560 Jun 29 '18 at 15:53
  • You did should, I tried it with `Ubuntu` and it worked. Sorry dude. `$file` is the name of the file you want to convert. – Bernat Pedrol Vozmediano Jun 29 '18 at 15:54
  • I changed $file to the name of my file, and that was the output I received – 27560 Jun 29 '18 at 15:57
0

Could you please try following and let me know if this helps you.

awk -v s1="," '/part_no/ && value{if(header){print header;flag=1;header=""};print value;value=""}  NF{if(!flag){header=(header?header s1 "":"")$1};sub(/^[^[:space:]]+[[:space:]]+/,"");value=value?value s1 $0:$0} END{if(value){print value}}'  Input_file

Output will be as follows.

part_no,date_part,history_code,user_id,other_information,pool_no
100000001,2010-10-13 12:12:12,ABCD,rsmith,note: Monday, December 10,101011777
100000002,2010-10-21 12:12:12,GHIJ,jsmith,,101011888
100000002,2010-10-27 12:12:12,LMNO,fevers,[Mail],101011999
100000003,2010-11-13 12:12:12,QXRT,sjohnson,note: Tuesday, August 31,101011111

Adding a non-one liner form of solution too now.

awk -v s1="," '
/part_no/ && value{
  if(header){
    print header;
    flag=1;
    header=""}
  print value;
  value=""
}
NF{
  if(!flag){
    header=(header?header s1 "":"")$1}
  sub(/^[^[:space:]]+[[:space:]]+/,"")
  value=value?value s1 $0:$0
}
END{
  if(value){
    print value}
}'   Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • 1
    Can this become a single line statement? – 27560 Jun 29 '18 at 16:11
  • Here is the output I got: `part_no,date_part,history_code,user_id,other_information,pool_no ,101011777day,,December,10 ,,101011888,12:12:12 ,1010119997,12:12:12 ,101011111sday,,August,31` – 27560 Jun 29 '18 at 16:17
  • @MattRagland, I am getting proper output added it to my post too now, can you check once if your file is having carriage returns by doing `cat -v file` if yes then remove them by doing `awk '{gsub(/\r$/,"")} `' Input_file` and let me know then?? – RavinderSingh13 Jun 29 '18 at 16:22
  • 1
    Ahh good catch, yes, mine had carriage returns, I converted it to unix format, and that did the trick, however I am getting extra commas where there shouldn't be any. See the other information column? – 27560 Jun 29 '18 at 16:25
  • I did mark your reply positive, however it said that since my reputation is less that 15 (since this is my 1st post) it wouldn't show up globally. The dates in the other_information column has extra commas in it, that needs to be cleaned up – 27560 Jun 29 '18 at 16:30
  • @MattRagland, if a value is NULL the it is printing like `,,` that is as per need only right? – RavinderSingh13 Jun 29 '18 at 16:32
  • That isn't the problem, the date value has spaces in it, look at the other_information column, the value in it gets split into 3 different columns Monday,,December,10 It should be "Monday, December 10" Also can you add quotes to the columns, so that way commas are accepted values? – 27560 Jun 29 '18 at 16:38
  • 1
    Take a sec to think about your ternary expression and you'll understand why it can fail. Also, assigning to $1 will transform all other white space in the input to commas. Finally, it's obviously bad practice to hard-code the header line and thereby tightly couple your script to the specific values and the order they appear in the input data when the values are already present in the input in the order they need to be printed. – Ed Morton Jun 29 '18 at 17:08
  • @EdMorton, sure Ed I apologies, as per your suggestion I have removed hard coding part of headers but not sure about how to write regex about `$1` thing so that I could remove it's nullification, could you please guide me here, will be grateful to you sir. – RavinderSingh13 Jun 29 '18 at 17:23
  • 1
    Your ternaries instead of `var = var ? var s1 $X: s1 $X` should be `var = (var == "" ? "" : var s1) $X` to avoid duplicate code, not cause syntax errors in some awks, and so they don't fail when the first value of var evaluates numerically to 0. wrt the $1 assignment - I'm really not sure what you're trying to do but I THINK it might be to just get rid of the header value from each line and that'd be just `sub(/^[^[:space:]]+[[:space:]]+/,"")` instead of `$1=""; gsub(/^ +|^,/,"");` – Ed Morton Jun 29 '18 at 17:44
  • 1
    @EdMorton, sure Ed have done that and as usual THANKS A TON for guiding here, really grateful to you, happy weekend. – RavinderSingh13 Jun 29 '18 at 22:53
0

I know the OP said awk but bash was just sitting there.

#
# line to be printed
line=""

#
# first value on a line flag
first=""

#
# read the file
while read key val; do
    #
    # if key is empty then the input line is empty.
    if [ "$key" = "" ] ; then
        #
        # skip leading blank lines in the file
        if [ "$line" = "" ] ; then
            continue
        else
            #
            # print and reset the line
            echo $line
            line=""
            first=""
        fi
    else
        #
        # place the first comma after the first value
        if [ "$first" = "" ] ; then
            line="\"$val\""
            first="1"
        else
            line="$line,\"$val\""
        fi
    fi
done < file.txt

#
# print the last line, if there is one
if [ "$line" != "" ] ; then
    echo $line
fi
7 Reeds
  • 2,419
  • 3
  • 32
  • 64