How do we build Normalized table from DeNormalized text file one?

Question

How do we build Normalized table from DeNormalized text file one?

Thanks for your replies/time.

We need to build a Normalized DB Table from DeNormalized text file. We explored couple of options such as unix shell , and PostgreSQL etc. I am looking learn better ideas for resolutions from this community.

The input text file is various length with comma delimited records. The content may look like this:

XXXXXXXXXX , YYYYYYYYYY, TTTTTTTTTTT, UUUUUUUUUU, RRRRRRRRR,JJJJJJJJJ
111111111111,   22222222222, 333333333333, 44444444, 5555555, 666666
EEEEEEEE,WWWWWW,QQQQQQQ,PPPPPPPP

We like to normalize as follows (Split & Pair):

XXXXXXXXXX , YYYYYYYYYY
TTTTTTTTTTT, UUUUUUUUUU
RRRRRRRRR,JJJJJJJJJ
111111111111,   22222222222
333333333333, 44444444
5555555, 666666
EEEEEEEE,WWWWWW
QQQQQQQ,PPPPPPPP

Do we need to go with text pre-process and Load approach?

If yes, what is the best way to pre-process?

Are there any single SQL/Function approach to get the above?

Thanks in helping.

Appears to be an edited version of http://stackoverflow.com/q/20433479/398670 — Craig Ringer, Dec 09 '13 at 07:36

Jotne · Answer 1 · 2013-12-07T20:23:39.250

4

Using gnu awk (due to the RS)

awk '{$1=$1} NR%2==1 {printf "%s,",$0} NR%2==0' RS="[,\n]" file
XXXXXXXXXX,YYYYYYYYYY
TTTTTTTTTTT,UUUUUUUUUU
RRRRRRRRR,JJJJJJJJJ
111111111111,22222222222
333333333333,44444444
5555555,666666
EEEEEEEE,WWWWWW
QQQQQQQ,PPPPPPPP

{$1=$1} Cleans up and remove extra spaces
NR%2==1 {printf "%s,",$0} prints odd parts
NR%2==0 prints even part and new line
RS="[,\n]" sets the record to , or newline

edited Dec 07 '13 at 20:23

answered Dec 07 '13 at 19:45

Jotne

40,548
12
51
55

Thanks for your suggestion. I tried. It works. **Will RS =.. put ^M for some records at the end?** I have the same format with lot of trailing commas at the end of each records. Like 'XXXXXXXXXX , YYYYYYYYYY, TTTTTTTTTTT, UUUUUUUUUU, RRRRRRRRR,JJJJJJJJJ,,,,,,,,,,,,' How do I process this file using your awk command? – user2647763 - RIMD Dec 10 '13 at 02:42
It will not add `^M`. You can just test for `$0` to not get the extra commas like this: `awk '{$1=$1} NR%2==1 && $0 {printf "%s,",$0} NR%2==0 && $0' RS="[,\n]" file` – Jotne Dec 12 '13 at 11:45

score 1 · Answer 2 · answered Dec 12 '13 at 01:28

Here is an update. Here is what I did in Linux server.

    sed -i 's/\,,//g' inputfile   <------ Clean up lot of trailing commas

    awk '{$1=$1} NR%2==1 {printf "%s,",$0} NR%2==0' RS="[,\n]" inputfile <----Jotne's idea

    dos2unix -q -n inputfile outputfle <------ to remove ^M in some records

    outputfile is ready to process as comma delimited format

Any thoughts to improve above steps further?

Thanks in helping.

How do we build Normalized table from DeNormalized text file one?

2 Answers2

Linked