2

I am trying to remove a random block of characters from the 5th column in a dataset.

Sample data:

A | 12 | AA | 24 | Test to go and keep 192.168.1.1 > 192.168.2.1 | B

Result should look like:

A | 12 | AA | 24 | 192.168.1.1 > 192.168.2.1 | B

I have this so far:

awk 'BEGIN{FS=OFS="|"} {gsub(".*? 192","", $5 )} 1' file.txt

However this removes everything in the 5th column before the last match.

What the code does now:

.168.2.11

I need to remove everything before the first match not last

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • currently *awk doesn't support non-greedy (`.*?`) match, if you want so, you can write your own awk-function using some built-in functions like using match() function that match the shortest possible regex from its parameter and do some post-processing on the RSTART and RLENGTH and combine with substr() to remove the desired parts. – αғsнιη Jun 25 '21 at 08:12

2 Answers2

2

With your shown samples, please try following awk code. Simple explanation would be: set field separator and output field separator as | for all lines of Input_file. Then globally substitute spaces AND alphabets with NULL in 5th field. Add spaces as per shown samples before and after 5th field and finally print the edited/non-edited current line.

awk 'BEGIN{FS=OFS="|"} {gsub(/[[:alpha:]]+|[[:space:]]+/,"",$5);$5=" "$5" "} 1' Input_file


EDIT: In case you want to match always IP address > IP address form in 5th field then simply try following.

awk 'BEGIN{FS=OFS="|"} match($0,/([0-9]+\.){3}[0-9]+ > ([0-9]+\.){3}[0-9]+/){$5=substr($0,RSTART,RLENGTH)} 1' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • @AUSBRIS88, With your shown samples this code works fine, could you please do let me know where its not working? – RavinderSingh13 Jun 25 '21 at 07:04
  • Almost, however there was sum still missing. For the sake of simplicity I simplified the data code, but the actual data set I am trying to refine is: A | 12 | AA | 24 | Ipv4Header (tos 0x0 DSCP Default ECN Not-ECT ttl 64 id 0 protocol 6 offset (bytes) 0 flags [none] length: 56 192.168.1.1 > 192.168.2.1 | B I just need that 5th column to show: 192.168.1.1 > 192.168.2.1 – nobody-at-all Jun 25 '21 at 07:09
  • @AUSBRIS88, could you please these samples in your question and let me know then. – RavinderSingh13 Jun 25 '21 at 07:10
  • @AUSBRIS88, I believe my EDIT solution will work for you, please check it once and let me know then? – RavinderSingh13 Jun 25 '21 at 07:12
  • 1
    This answer worked for me!! awk 'BEGIN{FS=OFS="|"} match($0,/([0-9]+\.){3}[0-9]+ > ([0-9]+\.){3}[0-9]+/){$5=substr($0,RSTART,RLENGTH)} 1' Input_file Thank you so much for sharing! I was trying for a few hours to get this working – nobody-at-all Jun 25 '21 at 07:15
0

If IP address in column 5 of your file always starts with some particular numbers, for example " 192.168.", then, you can use: awk 'BEGIN{FS=OFS="|"}{$5=substr($5, index($5, " 192.168."))}1' file.txt

pii_ke
  • 2,811
  • 2
  • 20
  • 30