awk/gsub - print everything between double quotes in multiple occurrences per line

Question

I attempting to print all data between double quotes (sampleField="sampleValue"), but am having trouble to get awk and/or sub/gsub to return all instances of data between the double quotes. I'd then like to print all instances on the respective lines they were found to keep the data together.

Here is a sample of the input.txt file:

deviceId="1300", deviceName="router 13", deviceLocation="Corp"
deviceId="2000", deviceName="router 20", deviceLocation="DC1"

The output I'm looking for is:

"1300", "router 13", "Corp"
"2000", "router 20", "DC1"

I'm having trouble using gsub to remove all of the data between a , and =. Each time I've tried a different approach, it always just returns the first field and moves onto the next line.

UPDATE:

I forgot to mention that I won't know how many double quote encapsulated fields will be on each line. It could be 1, 3, or 5,000. Not sure if this affects the solution, but wanted to make sure it was out there.

score 5 · Accepted Answer · answered Jan 23 '13 at 20:12

A sed solution:

sed -r 's/[^\"]*([\"][^\"]*[\"][,]?)[^\"]*/\1 /g'
    <<< 'deviceId="1300", deviceName="router 13", deviceLocation="Corp"'

Output:

"1300", "router 13", "Corp"

Or for a file:

sed -r 's/[^\"]*([\"][^\"]*[\"][,]?)[^\"]*/\1 /g' input.txt

jim mcnamara · Answer 2 · 2013-01-23T20:33:40.863

2

awk -F '"' '{printf(" %c%s%c, %c%s%c, %c%s%c\n", 34,$2, 34, 34, $4,34, $6, 34) } ' \
    input file > newfile

is another simpler approach, using quote as a field separator.

awk 'BEGIN{ t=sprintf("%c", 34)}
     { for(i=1; i<=NF; i++){
        if(index($i,t) ){print $i}  }; printf("\n")}'  infile > outfile

More general awk approach.

edited Jan 23 '13 at 20:33

answered Jan 23 '13 at 20:05

jim mcnamara

16,005
2
34
51

Does this account for not knowing how many fields could be on each line? – Travis Crooks Jan 23 '13 at 20:07
No it is based on your example input. I will post a more general solution since you need one. – jim mcnamara Jan 23 '13 at 20:09

score 1 · Answer 3 · answered Jan 23 '13 at 22:52

awk -F \" '
    {
        sep=""
        for (i=2; i<=NF; i+=2) {
            printf "%s\"%s\"", sep, $i
            sep=", "
        }
        print ""
    }
' << END
deviceId="1300", deviceName="router 13", deviceLocation="Corp", foo="bar"
deviceId="2000", deviceName="router 20", deviceLocation="DC1"
END

outputs

"1300", "router 13", "Corp", "bar"
"2000", "router 20", "DC1"

Vietnhi Phuvan · Answer 4 · 2014-03-23T08:33:19.847

awk/sub/gsub/ is the probably neither the most direct way nor the easiest way to get it done. I like one-liners when they make sense:

(1) In Perl:

172-30-3-163:ajax vphuvan$ perl -pe 's/device.*?=//g' input.txt
"1300", "router 13", "Corp"
"2000", "router 20", "DC1"

where 
-p means "print to screen"
-e means execute the statement between the single quotes
s is a regular expression command which gives the instruction to substitute
g is the switch for the regular expression. /g instructs the program to carry out the substitution /device.*?=// wherever applicable
/device.*?=// is an instruction to replace with an empty string '' any expression that starts with the prefix "device" and that ends just before the closest "=" sign. Note that "deviceId", "deviceName"  and "deviceLocation" all start with the prefix "device" and each of them ends just before the "=" sign

(2) In bash:

172-30-3-163:ajax vphuvan$ sed "s/deviceId=//; s/deviceName=//; s/deviceLocation=//" input.txt
"1300", "router 13", "Corp"
"2000", "router 20", "DC1"

In this case, we are instructing sed to run three substitution instructions in a row where "deviceId", "deviceName" and "deviceLocation are each replaced with an empty string ''

It is unfortunate that sed (and sub and gsub) has much weaker support for regular expressions than Perl, which is the gold standard for full regular expression support. In particular, neither sed nor sub/gsub support the non-greedy instruction"?", and this failure considerably complicates my life.

score 0 · Answer 5 · answered Jan 24 '13 at 05:57

0

try this

awk -F\" '{ for(i=2; i<=NF; i=i+2){ a = a"\""$i"\""",\t";} {print a; a="";}}' temp.txt

output

"1300",  "router 13",     "Corp"
"2000",  "router 20",     "DC1"

answered Jan 24 '13 at 05:57

Mirage

30,868
62
166
261

score 0 · Answer 6 · answered Feb 26 '17 at 18:02

0

This is too late but One probable easy solution would be:

 $ awk -F"=|," '{print $2,$4,$6}' input.txt
"1300" "router 13" "Corp"
"2000" "router 20" "DC1"

answered Feb 26 '17 at 18:02

krock1516

441
10
30

You can add before the file like this, OFS=", " input.txt to get commas and space as well. – Claes Wikner Feb 26 '17 at 19:18

awk/gsub - print everything between double quotes in multiple occurrences per line

6 Answers6