1

I have a string with comma separated values, like:

742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0-,,,

As you can see, the 3rd comma separated value has sometimes special character, like the dash (-), in the end. I want to used sed, or preferably perl command to replace this string (with the -i option, so as to replace at existing file), with same string at the same place (i.e. 3rd comma separated value) but without the special character (like the dash (-)) at the end of the string. So, result at above example string should be:

742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0,,,

Since such multiple lines like the above are inside a file, I am using while loop at shell/bash script to loop and manipulate all lines of the file. And I have assigned the above string values to variables, so as to replace them using perl. So, my while loop is:

while read mystr
do
myNEWstr=$(echo $mystr | sed s/[_.-]$// | sed s/[__]$// | sed s/[_.-]$//)
perl -pi -e "s/\b$mystr\b/$myNEWstr/g" myFinalFile.txt
done < myInputFile.txt

where:

$mystr is the "SOME-STRING_A_-BLAHBLAH_1-4MP0-"
$myNEWstr result is the "SOME-STRING_A_-BLAHBLAH_1-4MP0"

Note that the myInputFile.txt is a file that contains the 3rd comma separated values of the myFinalFile.txt, so that those EXACT string values ($mystr) will be checked for special characters in the end, like underscore, dash, dot, double-underscore, and if they exist to be removed and form the new string ($myNEWstr), then finally that new string ($myNEWstr) to be replaced at the myFinalFile.txt, so as to have the resulting strings like the example final string shown above, i.e. with the 3rd comma separated sub-string value WITHOUT the special character in the end (which is dash (-) at above example).

Thank you.

Kostas75
  • 331
  • 1
  • 4
  • 13

1 Answers1

2

You could use the following regex:

s/^([^,]*,[^,]*,[^,]*)-,/$1,/

This defined csv fields as series of characters other than a comma (empty fields are allowed). We are looking for a dash at the very end of the third csv field. The regex captures everything until there, and then replaces it while omitting the dash.

$ cat t.txt
742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0-,,,
]$ perl -p -e 's/^([^,]*,[^,]*,[^,]*)-,/$1,/' t.txt
742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0,,,
]$
GMB
  • 216,147
  • 25
  • 84
  • 135
  • Can this be done for all special characters I mentioned together and not only dash, meaning all the following: underscore, dash, dot, double-underscore ?? Moreover, I want to replace only the EXACT string every time. But using perl, at least with my version (v5.8.3 built for IA64.ARCHREV_0-thread-multi), I could understand that there is problem to replace string that contains dash (or other special character) at the end of the string, it just does nothing, i.e. no replacement, for this string with the new string. – Kostas75 Nov 05 '19 at 10:33
  • @Kostas: well, you can expand the list of stop characters by using a character class (surrounded with brackets), like `s/^([^,]*,[^,]*,[^,]*)[_-.],/$1,/` (underscore, dash, dot). – GMB Nov 06 '19 at 00:12