2

I have a .CSV file with 7 fields, and the 3rd and 4th columns of the file has a number starting with an apostrophe ( ' ). Please see the example below.

col0,col1,col2,col3,col4,col5,col6,
1value0,1value1,'8972991766941,'8972991766941,1value4,1value5,1value6,
2value0,2value1,'8912988876583,'8912988876583,2value4,2value5,2value6,
3value,3value1,'8912981226981,'8912981226981,3value4,3value5,3value6,
2value0,4value1,'8912971783681,'8912971783681,4value4,4value5,4value6,

How do I get rid of the apostrophes in the 3rd and 4th columns only using either sed or awk?

Dhruuv
  • 343
  • 10
  • 24

3 Answers3

4

You can use awk:

awk -F, '{for (i=3;i<=4;i++) sub("'\''", "", $i)}1' OFS=, file
user000001
  • 32,226
  • 12
  • 81
  • 108
  • Could you please explain what the substitution is doing in the code above? I see like 3 apostrophes being used in the code. Also, can you help with one more way to achieve this in `sed`? – Dhruuv Sep 10 '13 at 18:24
  • You see three apostorphes because we have to escape it, otherwise it will terminate the string. Does `sed 's/'\''//g' file` work for you, or do you want only the third and fourth column? – user000001 Sep 10 '13 at 18:29
  • do all of the columns have `'stuff` or just three and four? can you be sure that the the Xth apostrophe will be in the third or fourth column? I ask this because you can do something like `sed s/\'//3` to replace the third apostrophe in each line with nothing. if that is the case, then you we can do this dependably with sed. another case where we could dependably do it is with the third and fourth column being a fixed length – PsychoData Sep 10 '13 at 23:17
  • @PsychoData: First two columns have the names, and I am not sure if the data in those columns might or might not have any apostrophes. Also, the 3rd and 4th columns are not Fixed length. They are number fields and the length of the number might vary... any other solutions we can think of? – Dhruuv Sep 11 '13 at 13:22
  • 2
    @Dhruuv if we arent sure that it will be say, the third and fourth apostrophe, in a line, and the column you need to replace aren't in a set pattern that wont be in another column, then I know of no way to replace only apostrophes in the third and fourth columns, without using awk anyway. Should be able to use awk to print column one and two and then use sed to replace apostrophes in three and four, then print the rest of the columns. But thats already working fine with the sub command in sed. – PsychoData Sep 11 '13 at 15:07
  • @user000001,@psyhodata: a quick question, how do we do for the columns which are **not consecutive**? lets say the **3rd** column and **7th** column has the apostrophes... any code tweaks? – Dhruuv Sep 12 '13 at 21:04
  • @Dhruuv I guess you could do something like `awk -F, '{sub("'\''", "", $3);sub("'\''", "", $7)}1' OFS=, file`. Also, since sub accepts a regex, you can do `sub("^'\''", ...` to replace only if it is in the beginning, or `gsub("'\''",...` to replace all occurrences in the field, etc. – user000001 Sep 12 '13 at 21:17
2

If ' is not needed at all, remove it everywhere.

awk '{gsub("'\''","")}1' file
Jotne
  • 40,548
  • 12
  • 51
  • 55
  • I need to get rid of this only in the 3rd and 4th columns of the file, as I do not want to change the data of other fields just in case they might have an apostrophe(s) in them... thanks Jotne. – Dhruuv Sep 11 '13 at 13:24
-3

Press CTRL+H on Notepad, replace "'" with "" (Nothing)

Raza
  • 3,147
  • 2
  • 31
  • 35
  • He can only use `sed` or `awk`, so it is unlikely that notepad is available in his environment and is also not scriptable like `sed` or `awk` are. Also, this will remove apostrophes from **all** columns; he only wants them gone form the 3rd and 4th columns. – Justin L. Sep 10 '13 at 17:55
  • 1
    Thanks for the response Raza, however, like Justin said, I have to get rid of this in a .CSV file which is present in linux environment. – Dhruuv Sep 10 '13 at 17:59