4

I have a file like this:

id=1+5
id=1+9
id=25100+10
xyz=1+
abc=123456
conf_string=LMN,J,IP,25100+1,0,3,1

I would like to replace instances of x+y to the value of (x+y). That is 1+5 is replaced by 6, 25100+1 is replaced by 25001 and so on.

I was trying this with gawk by matching with a regex like /[:digit:]++[+digit:]+/ Using the following I could replace some of the instances.

gawk 'BEGIN {FS = "[=+,]"} ; /[:digit:]++[+digit:]+/ {print $1 "=" ($2 + $3)} ! /[:digit:]++[+digit:]+/ {print $0}' /tmp/1.txt 
id=6
id=10
id=25110
xyz=1+
abc=123456
conf_string=LMN,J,IP,25100+1,0,3,1

I am unsure of how to match and replace (25100+1) in the above example. Ideally, I would like to extract all instances of <number> + <number> and replace it with the sum. It will always be a sum of two numbers.

javed
  • 427
  • 1
  • 5
  • 14

3 Answers3

6

With GNU awk:

$ awk 'BEGIN{r = @/([0-9]+)\+([0-9]+)/}
       { while(match($0, r, m)) sub(r, m[1] + m[2]) } 1' ip.txt
id=6
id=10
id=25110
xyz=1+
abc=123456
conf_string=LMN,J,IP,25101,0,3,1
  • r=@/([0-9]+)\+([0-9]+)/ save the regex in a variable, [0-9] will match all digits
  • match($0, r, m) will be true if the regex matches, portion matched will be available in m array
  • m[1] + m[2] sum the two numbers
  • For older versions, use awk '{while(match($0, /([0-9]+)\+([0-9]+)/, m)) sub(/([0-9]+)\+([0-9]+)/, m[1] + m[2]) } 1' ip.txt as saving regex in a variable is not supported

Note

  • [:digit:] should be used inside a character class [[:digit:]]
  • ++ should be +\+ as you intend the second one to match + literally

See also: How to coerce AWK to evaluate string as math expression?


With perl you can simply use e flag to evaluate the replacement as code

perl -pe 's/(\d+)\+(\d+)/$1+$2/ge' ip.txt
# or    
perl -pe 's/\d+\+\d+/$&/gee' ip.txt
Sundeep
  • 23,246
  • 2
  • 28
  • 103
  • 1
    Very nice and concise! +1 – SiegeX Jun 07 '20 at 05:46
  • @Sundeep I get a syntax error while running the command you sent. I am not so well versed in awk syntax, so could not easily resolve the error. ``` gawk 'BEGIN{r = @/([0-9]+)\+([0-9]+)/} { while(match($0, r, m)) sub(r, m[1] + m[2]) } 1' 3.txt gawk: cmd. line:1: BEGIN{r = @/([0-9]+)\+([0-9]+)/} gawk: cmd. line:1: ^ syntax error ``` – javed Jun 07 '20 at 17:04
  • @javed could be because of version differences, I tested it on `gawk version 5.1.0` ... does `awk '{while(match($0, /([0-9]+)\+([0-9]+)/, m)) sub(/([0-9]+)\+([0-9]+)/, m[1] + m[2]) } 1'` work? – Sundeep Jun 08 '20 at 02:28
  • 1
    Yes, the awk command worked. I have 'GNU Awk 4.1.4', perhaps that is why the error came. Thank you for the updated command. – javed Jun 14 '20 at 07:45
  • I have accepted this solution, because it worked for me the earliest. Other solutions from @RavinderSingh13 & agc also worked out equally well. Thank you everyone for helping me out. – javed Jun 16 '20 at 05:31
3

Could you please try following, written and tested with GNU awk, with shown samples. This solution will take care of conditions in case any 2nd field is starting with + or ending with + , like 4th line in output. Edited code tested on following link https://ideone.com/x2EQ0P

awk '
BEGIN{
  FS=OFS="="
}
{
  num=split($2,array1,",")
  for(i=1;i<=num;i++){
    num1=split(array1[i],array2,"+")
      for(k=1;k<=num1;k++){
        if(num1==1){array1[i]=array2[k] }
        if(array2[k]~/^[0-9]+$/){
           val+=array2[k]
           array1[i]=(array2[1]!=""?"":"+") val (array2[num1]!=""?"":"+")
        }
     }
     val=""
  }
  for(o=1;o<=num;o++){
    value=(value?value ",":"")array1[o]
  }
  $2=value
  value=""
}
1
' Input_file


Explanation: Adding detailed explanation for above.

awk '                                             ##Starting awk program from here.
BEGIN{                                            ##Starting BEGIN section of this code from here.
  FS=OFS="="                                      ##Setting field separator nd output field separator as = here.
}
{
  num=split($2,array1,",")                        ##Splitting 2nd field here.
  for(i=1;i<=num;i++){                            ##Running for loop till value of num here.
    num1=split(array1[i],array2,"+")              ##Splitting 2nd field further to array2 with delimiter + here.
      for(k=1;k<=num1;k++){                       ##Running for loop to all fields wchih are separated by + here.
        val+=array2[k]                            ##Creating val which keeps on adding value of array2 with index k here.
      }
    array1[i]=(array2[1]!=""?"":"+") val (array2[num1]!=""?"":"+")    ##Assigning val to current array1 value after addition of all items in 2nd field.
    val=""                                        ##Nullifying val here.
  }
  for(o=1;o<=num;o++){                            ##Running a for loop till length of 1st array here.
    value=(value?value ",":"")array1[o]           ##Keep on appending value of array1 with index o to var value here.
  }
  $2=value                                        ##Setting value to 2nd field here.
  value=""                                        ##Nullify var value here.
}
1                                                 ##Mentioning 1 to print all lines here.
' Input_file                                      ##Mentioning Input_file name here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • 3
    That another good systematic approach. But, @sundeep pulled a rabbit out of his hat with the `match` solution. I leaned from both. – David C. Rankin Jun 07 '20 at 07:16
  • @DavidC.Rankin, Thank you sir for encouragement. Yeah Sundeep answer is very Good :) – RavinderSingh13 Jun 07 '20 at 07:17
  • 1
    @RavinderSingh13 Thank you so much for explaining the solution in detail. While it mostly works, the one issue I see is that it does unintentional replacements in the last line. `conf_string=0,0,0,25101,0,3,1`. Which should have been `conf_string=LMN,J,IP,25101,0,3,1` – javed Jun 07 '20 at 17:06
  • @javed, ok I have edited code as per your shown sample and tested it on mobile over https://ideone.com/x2EQ0P link too please check it once and lemme know in case of queries here. – RavinderSingh13 Jun 07 '20 at 17:45
  • @javed, Javed bhai, did this work for you? Lemme know in case of any queries. – RavinderSingh13 Jun 08 '20 at 12:36
  • Yes, it worked after your changes. Thank you so much. Sorry for the late reply. – javed Jun 14 '20 at 07:42
2

Using GNU sed, to wrap some shell script $(( )) around the sums, and then the dangerous eval command to run it:

sed 's#\(.*\)\([0-9]\++[0-9]\+\)\(.*\)#printf "\1$((\2))\3"#e' file

...or, with fewer backslashes \:

sed -r 's#(.*)([0-9]+\+[0-9]+)(.*)#printf "\1$((\2))\3"#e' file

Output of either:

id=6
id=10
id=251010
xyz=1+
abc=123456
conf_string=LMN,J,IP,25101,0,3,1
agc
  • 7,973
  • 2
  • 29
  • 50
  • The above code assumes the input is clean and isn't full of single and double quotes -- if it were, there'd need to be additional code prepended to quote those before they could cause trouble with *`e`val*. – agc Jun 07 '20 at 17:54