iterate over a series of strings and replace spaces between two numbers with zeros

Question

I have a file such as this:

ME45 P   1311 41130 1.253
ME39 P   1311 41130 7.700
ME38 P   1311 41130 7.776
ME37 P   1311 41130 8.285
ME36 P   1311 41130 8.689
ME30 P   1311 4113010.252
ME26 P   1311 4113010.486
ME29 P   1311 41130 9.598
ME28 P   1311 41130 9.356
ME21 P   1311 41130 9.911
ME20 P   1311 4113010.465
ME17 P   1311 4113010.984

and I need to replace the space between two immediate adjacent numbers with zero (e.g. replace the gap between the second column where there is 1131 and the third column where there is 411 with 0), which will return me the desired output such as:

KALI P   131104113008.580
IMOB P   131104113001.863

When I say the space between two immediate adjacent numbers, meaning there is only one space between two number and I want to replace this space with zero.

So far, I have been using awk to try and solve this:

awk '{gsub("1311 41130", "1311041130")}1' myfile > myfile_tmp && mv myfile_tmp myfile

but unfortunately, the file contains thousands of lines and as the series of numbers changes, it becomes painful to look each block of column one by one.

My idea to solve this is by iterating over a series of strings, store them in a variable or an array, check if an element containing blank space exists and return its index, then check whether neighbouring element is a number or not by using this "blank space" index as a reference, and then replace this space with zero if indeed it has numeric neighbours. However, I don't know if it is doable in bash or awk. I have a better understanding of Python, but somehow this blank space is a hurdle to me; Python might recognise this space as a delimiter.

Is there any way to solve this problem elegantly?

Python does not see spaces as delimiters. Go ahead and try it. (Don't write code that treats spaces as delimiters, though.) — Jongware, Mar 04 '20 at 17:31
you state: "replace spaces between two immediate adjacent numbers", but you didn't convert '24.3465 5' ('immediate adjacent numbers') to '24.346505' ... ??? also, how many spaces is defined as 'immediate adjacent' ... since '580 24.3465' could be considered as 'adjacent' if you include 6 spaces as matching your requirement; also, what's between each column ... tab? fixed number of spaces? (ie, how do you define what is 'column 1', 'column 2', 'column 3') ... if using 'white space' as a delimiter then the first line has 7 columns while the last line has 5 columns — markp-fuso, Mar 04 '20 at 17:34
@markp I wasn't clear enough; edited my question for better clarity. — dex10, Mar 04 '20 at 17:48
_Python might recognise this space as a delimiter._ What do you mean? — AMC, Mar 04 '20 at 18:08

score 4 · Answer 1 · answered Mar 04 '20 at 18:09

4

You can use a simple sed regex with 2 capture groups that match a digit separated by a single space:

sed -E 's/([0-9]) ([0-9])/\10\2/g' file

ME45 P   131104113001.253
ME39 P   131104113007.700
ME38 P   131104113007.776
ME37 P   131104113008.285
ME36 P   131104113008.689
ME30 P   131104113010.252
ME26 P   131104113010.486
ME29 P   131104113009.598
ME28 P   131104113009.356
ME21 P   131104113009.911
ME20 P   131104113010.465
ME17 P   131104113010.984

answered Mar 04 '20 at 18:09

anubhava

761,203
64
569
643

It's funny because same output is shown in both answers but this one didn't work. – anubhava Mar 05 '20 at 10:11

score 2 · Accepted Answer · answered Mar 04 '20 at 20:45

2

$ awk 'BEGIN{FS=OFS="   "} {gsub(/ /,0,$2)} 1' file
ME45 P   131104113001.253
ME39 P   131104113007.700
ME38 P   131104113007.776
ME37 P   131104113008.285
ME36 P   131104113008.689
ME30 P   131104113010.252
ME26 P   131104113010.486
ME29 P   131104113009.598
ME28 P   131104113009.356
ME21 P   131104113009.911
ME20 P   131104113010.465
ME17 P   131104113010.984

answered Mar 04 '20 at 20:45

Ed Morton

188,023
17
78
185

thanks for the answer, but unfortunately I still have some gaps. – dex10 Mar 05 '20 at 09:32
I took back what I said. It resolved my problem, thanks! However, in some instances, I have some trailing zeros at the end (e.g. ME17 P '131104113010.98400'), and it only rarely occurs on several lines. Don't know what caused them. Could you please explain to me why I need to declare `BEGIN {FS=OFS=" "}` ? Does it mean that awk needs to perform gsub after a separator of `" "` (four spaces)? – dex10 Mar 05 '20 at 10:07
1

Yes but it's 3 spaces. That's what splits the record into 2 fields where $1 is `ME45 P` and $2 is `1311 41130 1.253` so the gsub() works on the 2nd field. If your input can have undesirable trailing blanks that are getting converted to zeros then just add `sub(/ +$/,""); before the `gsub()` to first remove those trailing blanks. – Ed Morton Mar 05 '20 at 14:16

iterate over a series of strings and replace spaces between two numbers with zeros

2 Answers2