Remove all characters before specific number in Nth column at first match NOT last

Question

I am trying to remove a random block of characters from the 5th column in a dataset.

Sample data:

A | 12 | AA | 24 | Test to go and keep 192.168.1.1 > 192.168.2.1 | B

Result should look like:

A | 12 | AA | 24 | 192.168.1.1 > 192.168.2.1 | B

I have this so far:

awk 'BEGIN{FS=OFS="|"} {gsub(".*? 192","", $5 )} 1' file.txt

However this removes everything in the 5th column before the last match.

What the code does now:

.168.2.11

I need to remove everything before the first match not last

currently *awk doesn't support non-greedy (`.*?`) match, if you want so, you can write your own awk-function using some built-in functions like using match() function that match the shortest possible regex from its parameter and do some post-processing on the RSTART and RLENGTH and combine with substr() to remove the desired parts. — αғsнιη, Jun 25 '21 at 08:12

RavinderSingh13 · Answer 1 · 2021-06-25T07:07:21.570

2

With your shown samples, please try following awk code. Simple explanation would be: set field separator and output field separator as | for all lines of Input_file. Then globally substitute spaces AND alphabets with NULL in 5th field. Add spaces as per shown samples before and after 5th field and finally print the edited/non-edited current line.

awk 'BEGIN{FS=OFS="|"} {gsub(/[[:alpha:]]+|[[:space:]]+/,"",$5);$5=" "$5" "} 1' Input_file

EDIT: In case you want to match always IP address > IP address form in 5th field then simply try following.

awk 'BEGIN{FS=OFS="|"} match($0,/([0-9]+\.){3}[0-9]+ > ([0-9]+\.){3}[0-9]+/){$5=substr($0,RSTART,RLENGTH)} 1' Input_file

edited Jun 25 '21 at 07:07

answered Jun 25 '21 at 07:01

RavinderSingh13

130,504
14
57
93

@AUSBRIS88, With your shown samples this code works fine, could you please do let me know where its not working? – RavinderSingh13 Jun 25 '21 at 07:04
Almost, however there was sum still missing. For the sake of simplicity I simplified the data code, but the actual data set I am trying to refine is: A | 12 | AA | 24 | Ipv4Header (tos 0x0 DSCP Default ECN Not-ECT ttl 64 id 0 protocol 6 offset (bytes) 0 flags [none] length: 56 192.168.1.1 > 192.168.2.1 | B I just need that 5th column to show: 192.168.1.1 > 192.168.2.1 – nobody-at-all Jun 25 '21 at 07:09
@AUSBRIS88, could you please these samples in your question and let me know then. – RavinderSingh13 Jun 25 '21 at 07:10
@AUSBRIS88, I believe my EDIT solution will work for you, please check it once and let me know then? – RavinderSingh13 Jun 25 '21 at 07:12
1

This answer worked for me!! awk 'BEGIN{FS=OFS="|"} match($0,/([0-9]+\.){3}[0-9]+ > ([0-9]+\.){3}[0-9]+/){$5=substr($0,RSTART,RLENGTH)} 1' Input_file Thank you so much for sharing! I was trying for a few hours to get this working – nobody-at-all Jun 25 '21 at 07:15

pii_ke · Answer 2 · 2021-06-26T21:47:24.063

0

If IP address in column 5 of your file always starts with some particular numbers, for example " 192.168.", then, you can use: awk 'BEGIN{FS=OFS="|"}{$5=substr($5, index($5, " 192.168."))}1' file.txt

edited Jun 26 '21 at 21:47

answered Jun 26 '21 at 20:04

pii_ke

2,811
2
20
30

Are you sure that works? `awk: fatal: 1 is invalid as number of arguments for index` – tink Jun 26 '21 at 20:36
@tink thank you for pointing it out. I have fixed the error. – pii_ke Jun 26 '21 at 21:43

Remove all characters before specific number in Nth column at first match NOT last

2 Answers2