using awk and gensub to remove the part in a string ending with "character+number+S"

Question

My goal is to remove the end "1S" as well as the letter immediately before it, in this case "M". How do I achieve that? My non-working code :

echo "14M3856N61M1S" | gawk '{gensub(/([^(1S)]*)[a-zA-Z](1S$)/, "\\1", "g") ; print $0}'
>14M3856N61M1S

The desired results should be

>14M3856N61

Some additional information here . 1. I do not think substr will work here since my actual target strings would come with various lengths. 2. I prefer not to take the approach of defining special delimiter because this would be used together with "if" as part of the awk conditional operation while the delimiter is already defined globally. Thank you in advance!

@Cyrus, Thanks. I prefer a awk solution since I am applying it to be part of the awk script with conditional operation. — Aron, Oct 08 '18 at 04:55
It looks like you are hoping that `[^(1S)]` will do something it doesn't do. It matches a single character which is not `(` or `1` or `S` or `)`. — tripleee, Oct 08 '18 at 05:04
Possible duplicate of [sed: Can my pattern contain an “is not” character? How do I say “is not X”?](https://stackoverflow.com/questions/7520704/sed-can-my-pattern-contain-an-is-not-character-how-do-i-say-is-not-x/) — tripleee, Oct 08 '18 at 05:05

Inian · Accepted Answer · 2018-10-08T08:04:48.600

Why not use a simple substitution to match the 1S at the last and match any character before it?

echo "14M3856N61M1S" | awk '{sub(/[[:alnum:]]{1}1S$/,"")}1'
14M3856N61M1S

Here the [[:alnum:]] corresponds the POSIX character class to match alphanumeric characters (digits and alphabets) and {1} represent to match just one. Or if you are sure about only characters could occur before the pattern 1S, replace [[:alnum:]] with [[:alpha:]].

To answer OP's question to put the match result on a separate variable, use match() as sub() does not return the substituted string but only the count of number of substitutions made.

echo "14M3856N61M1S" | awk 'match($0,/[[:alnum:]]{1}1S$/){str=substr($0,1,RSTART-1); print str}'

RavinderSingh13 · Answer 2 · 2018-10-08T05:12:55.470

EDIT: As per OP's comment I am adding solutions where OP could get the result into a bash variable too as follows.

var=$(echo "14M3856N61M1S" | awk 'match($0,/[a-zA-Z]1S$/){print substr($0,1,RSTART-1)}' )
echo "$var"
14M3856N61

Could you please try following too.

echo "14M3856N61M1S" | awk 'match($0,/[a-zA-Z]1S$/){$0=substr($0,1,RSTART-1)} 1'
14M3856N61

Explanation of above command:

echo "14M3856N61M1S" |        ##printing sample string value by echo command here and using |(pipe) for sending standard ouptut of it as standard input to awk command.
awk '                         ##Starting awk command here.
  match($0,/[a-zA-Z]1S$/){    ##using match keyword of awk here to match 1S at last of the line along with an alphabet(small or capital) before it too.
   $0=substr($0,1,RSTART-1)   ##If match found in above command then re-creating current line and keeping its value from 1 to till RSTART-1 value where RSTART and RLENGTH values are set by match out of the box variables by awk.
  }                           ##Closing match block here.
1'                            ##Mentioning 1 will print the edited/non-edited values of lines here.

this is indeed pretty close to what I tried to achieve. Very much appreciated ! — Aron, Oct 08 '18 at 05:38
@ RavinderSingh13, I am learning how to do just that. Got it now. — Aron, Oct 08 '18 at 05:53

score 1 · Answer 3 · answered Oct 08 '18 at 05:12

1

echo "14M3856N61M1S" | awk -F '.1S$' '{print $1}'

Output:

14M3856N61

answered Oct 08 '18 at 05:12

Cyrus

84,225
14
89
153

That's the first thing I thought of myself too. I prefer not to define the delimiter as it's already defined globally in the overall awk code. Thank you though, for taking the time to respond. – Aron Oct 08 '18 at 05:25
1

With its own array and separator: `echo "14M3856N61M1S" | awk '{split($1,a,".1S$"); print a[1]}'` – Cyrus Oct 08 '18 at 05:36
A neat solution. Cool. Thank you ! – Aron Oct 08 '18 at 05:42

using awk and gensub to remove the part in a string ending with "character+number+S"

3 Answers3