4

The data looks like this :

There is stuff here (word, word number phrases)
(word number anything, word phrases), even more
...

There is a lot of them in different files. There is different kind of data too, all around it that isn't in the same format. The data inside the paratheses can't change, and it's always on the same line. I do not have to deal with:

(stuff number,
maybe more here)

I would like to be able to replace the comma with a colon

Desired output would be

There is stuff here (word: word number phrases)
(word number anything: word phrases), even more
...

5 Answers5

5

Assuming there's only one comma to be replaced inside parentheses, this POSIX BRE sed expression will replace it with colon:

sed 's/(\(.*\),\(.*\))/(\1:\2)/g' file

If there are more than one comma, only the last one will be replaced.

In multiple-commas scenario, you can replace only the first one with:

sed 's/(\([^,]*\),\([^)]*\))/(\1:\2)/g' file
randomir
  • 17,989
  • 1
  • 40
  • 55
5

Here's a version for awk that uses the parentheses as record separators:

awk -v RS='[()]' 'NR%2 == 0 {sub(/,/,":")} {printf "%s%s", $0, RT}' file

The stuff between parentheses will be every even-numbered record. The RT variable holds the character that matched the RS pattern for this record.

Note that this only replace the first comma of the parenthesized text. If you want to replace all, use gsub in place of sub

glenn jackman
  • 238,783
  • 38
  • 220
  • 352
2

While @randomir's sed solution dwells on replacing a single comma inside parentheses, there is a way to replace multiple commas inside parentheses with sed, too.

Here is the code:

sed '/(/ {:a s/\(([^,()]*\),/\1:/; t a}'

or

sed '{:a;s/\(([^,()]*\),/\1:/;ta}'

or

sed -E '{:a;s/(\([^,()]*),/\1:/;ta}'

See an online demo.

In all cases, the main part is between the curly braces. Here are the details for the POSIX ERE (sed with -E option) pattern:

  • :a;
  • s/(\([^,()]*),/\1:/; - find and capture into Group 1
    • \( - a ( char
    • [^,()]* - zero or more chars other than ,, ( and ) (so, only those commas will be removed that are in between the closest ( and ) chars, not inside (..,.(...,.) - remove ( from the bracket expression to also match in the latter patterns)
    • \1: - and replace with the Group 1 contents + a colon after it
  • ta - loop to :a if there was a match at the preceding iteration.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

Using awk

$ awk -v FS="" -v OFS="" '{ c=0; for(i=1; i<=NF; i++){ if( $i=="(" || $i ==")" ) c=1-c; if(c==1 && $i==",") $i=":" } }1' file
There is stuff here (word: word number phrases)
(word number anything: word phrases), even more

-v FS="" -v OFS="" Set FS to null so that each char is treated as a field.

set variable c=0. Iterate over each field using for loop and toggle the value of c if ( or ) is encountered.
if c==1 and , appears then replace it to :

Rahul Verma
  • 2,946
  • 14
  • 27
1

With perl

$ perl -pe 's/\([^()]+\)/$&=~s|,|:|gr/ge' ip.txt
There is stuff here (word: word number phrases)
(word number anything: word phrases), even more

$ echo 'i,j,k (a,b,c) bar (1,2)' | perl -pe 's/\([^()]+\)/$&=~s|,|:|gr/ge'
i,j,k (a:b:c) bar (1:2)

$ # since only single character is changed, can also use tr
$ echo 'i,j,k (a,b,c) bar (1,2)' | perl -pe 's/\([^()]+\)/$&=~tr|,|:|r/ge'
i,j,k (a:b:c) bar (1:2)
  • e modified allows to use Perl code in replacement section
  • \([^()]+\) match non-nested () with one or more characters inside
  • $&=~s|,|:|gr perform another substitution on matched text, the r modifier would return the modified text
Sundeep
  • 23,246
  • 2
  • 28
  • 103