How can I replace a the pattern ",," with in awk?

Question

I am doing a ldapsearch query which returns the results as follow

John Joe jjoe@company.com +1 916 662-4727  Ann Tylor Atylor@company.com (987) 654-3210  Steve Harvey sharvey@company.com 4567893210  (321) 956-3344  ...

As you can see between each personal record output there is a blank space and the phone numbers might start with +1 or not and might have blank between the numbers or parenthesis and finally between personal records there are two blank spaces. For example:

I would like to transform these entries to the following format:

John,Joe,jjoe@company.com,(916) 662-4727
Ann,Tylor,Atylor@company.com,(987) 654-3210
Steve,Harvey,sharvey@company.com,(456) 789-3210,(321) 956-3344
...

So basically replace the one blanks with one comma "," and two blanks with , so that at the end I have one personal record (comma separated) per line. Example:

I am trying awk and have managed to replace with "," which makes

<blank><blank> to double comma ",,". 
But can't figure out how to turn ",," to <RETURN>

11/22/2017 ----****** UPDATE ******--------- 11/22/2017

I made this track too crowded. I will post a fresh questions with more details.

What have you tried? Where does your attempt have problems? Please add your attempt and your results to the question, so that we know what you need help with. — ghoti, Nov 17 '17 at 03:40
Also, does Shane Harvey have two phone numbers? The double space before the last telephone number in your sample input makes it appear to be a new record. — ghoti, Nov 17 '17 at 03:43
I was doing: `ldapsearch -LLL -x -H ldaps: -b "ou=people,dc=,dc=edu" -D uid=,ou=applications,dc=,dc=edu -w givenname sn mail telephoneNumber | awk -F ":" '{printf $2}{printf "\n"}' | awk -F "uid" '{printf $1}' | tr " " ","` — Asghar, Nov 17 '17 at 17:52
The stuff you have tried is a vital part of your question, and should be included in your question, not just added to comments after-the-fact. Next time you use StackOverflow, consider including your work so far in the question, and I'm sure you'll get a larger number of high quality answers. — ghoti, Nov 17 '17 at 20:39
@ghoti Excellent point. This was my first time submitting (using) the StackOverflow. I certainly will adhere to your recommendation next time! — Asghar, Nov 20 '17 at 17:13
I thought the solution from @tshiono will resolve my issue. It does to a certain point, but not totally, because I found there are other fields in the data which I was not aware of before. Here is the full picture of the problem and what I have done so far: The data in the file can have one of the following formats: — Asghar, Nov 21 '17 at 19:14

score 2 · Answer 1 · answered Nov 17 '17 at 04:09

For your request, a lot of replaces needed to be done by using sed.

$ cat sed-script
s/\ \ ([A-Za-z])/\n\1/g;        # replace alphabets which appended double spaced to '\n'
s/\ \ /,/g;                     # replace remaining double spaces to ',' 
s/([A-Za-z]) /\1,/g;            # releace the space appended alphabets to ',' 
s/\+1//;                        # eliminate +1
s/[ ()-]//g;                    # eliminate space, parenthesis, or dash
s/([^0-9])([0-9]{3})/\1(\2) /g; # modify first 3 numeric embraced by parenthesis
s/([0-9]{4}[^0-9])/-\1/g;       # prepend a '-' to last 4 numerics

$ sed -r -f sed-script file 
John,Joe,jjoe@company.com,(916) 662-4727
Ann,Tylor,Atylor@company.com,(987) 654-3210
Steve,Harvey,sharvey@company.com,(456) 789-3210,(321) 956-3344,...

Thanks for the feedback this is great. – Asghar Nov 17 '17 at 17:56 — Asghar, Nov 17 '17 at 17:56

RavinderSingh13 · Accepted Answer · 2017-11-17T02:55:51.490

If your Input_file is same as shown sample then following awk may help you in same.

awk --re-interval '{gsub(/[0-9]{3}-[0-9]{4} +/,"&\n");print}'  Input_file

I am having OLD version of awk so I have mentioned --re-interval in it on new awk no need to mention it.

Explanation: Adding explanation for solution too here.

awk --re-interval '{               ##using --re-interval to use the extended regex as I have old version of awk.
gsub(/[0-9]{3}-[0-9]{4} +/,"&\n"); ##Using gsub utility(global substitute) of awk where I am checking 3 continuous dots then dash(-) then 4 continuous digits and till space with same regex match and NEW LINE.
print                              ##printing the line of Input_file
}'  Input_file                     ##Mentioning the Input_file here.

Thank you very much for the detailed solution and also the comments. — Asghar, Nov 21 '17 at 18:32

score 0 · Answer 3 · answered Nov 17 '17 at 07:20

0

Just for your interest, you could say with Perl:

perl -e '
while (<>) {
    s/  /\n/g;
    s/ /,/g;
    s/(\+1,)?\(?(\d{3})\)?[-,]?(\d{3})[-,]?(\d{4})/($2) $3-$4/g;
    print;
}' file

answered Nov 17 '17 at 07:20

tshiono

21,248
2
14
22

@tshionoYour solution was it! Thanks. – Asghar Nov 17 '17 at 17:51

How can I replace a the pattern ",," with in awk?

3 Answers3