0

My text file has more than ten thousand lines. Each line starts with a word or a phrase followed by a tab and the content, such as:

[line 1] This is the first line. [tab] Here is the content.[end of line]

I want to find character s in all the words between the beginning of each line and a tab (\t), and replace it by a pipe (|) so that the text will look like:

[line 1] Thi| i| the fir|t line. [tab] Here is the content.[end of line]

Here is what I have done:

Search: ^(.*)s+(.*)?\t 
Replace: \1|\2\t

It works but the problem is it does not replace s in one replace. I have to click on Replace All for several times before s in all the words is replaced.

So it comes to my question: how can I replace all the occurrences of character s in just one search and replace?

Note that I'm working on TextWrangler but I'm OK with other editors.

Thanks a lot.

Niamh Doyle
  • 1,909
  • 7
  • 30
  • 42

1 Answers1

1

You are searching for lines containing an s and do the match. Instead you should be searching for the s directly, and use lookahead to ensure that it is followed by a tab.

Search: s(?=.*\t)
Replace: |

Note that this catches all s's up to the last tab. - This will be a problem if your main content can contain tabs.

To stop catching s's after the first tab you have to cheat. Since variable length negative lookbehind doesn't work in AFAIK any regexp dialect.

However if we can ensure that the last s catches the whole line...

Search: (?:(^[^s\t]*\t.*$)|s([^s\t]*(?:(?=s.*\t)|\t.*$)))
Replace: |\1\2

This will catch the whole line in the case where no s occurs before the first tab. And put a | in front of that line. I see no way around this.

Taemyr
  • 3,407
  • 16
  • 26
  • Many thanks. Absolutely, that will be a problem because my main content sometimes contains a couple of tabs. How to limit it to stop searching and replacing after the first tab in each line? – Niamh Doyle Oct 22 '13 at 16:15
  • I have given a suggestion to get around this. Although it's not finished. – Taemyr Oct 22 '13 at 16:30
  • @NiamhDoyle OK. I think the current solution is about the best I can do. (Barring typo's) – Taemyr Oct 22 '13 at 16:38
  • Thanks a lot for your help, @Taemyr. That's the problem as well because the part before the tab sometimes contains no `s` at all. Ok, I may have to stick with my way: run the regex several times until it finishes replacing all `s`. – Niamh Doyle Oct 22 '13 at 16:39
  • Your regexp also misbehaves when there are multplie tabs. :) – Taemyr Oct 22 '13 at 16:43
  • Yeah :-), but any way I cannot get your latest version work normally. It adds the pipe `|` at the end of any line containing no `s` before the first tab. – Niamh Doyle Oct 22 '13 at 16:59
  • @NiamhDoyle I think I have fixed this, see current regexp. Assuming the problem was that it added at both the end and the beginning. And not that it added at the end rather than at the beginning. However it will still add at the beginning. – Taemyr Oct 22 '13 at 17:19
  • It may be the best walkaround now. I think I will run your regex to replace `s` before the first tab. I then remove all `|` at the end of each line. So voila! I will accept your question as the answer. Thank you very much for your time. – Niamh Doyle Oct 22 '13 at 17:25
  • @NiamhDoyle Note that your solution will break any lines starting with an s. So unless you know that this will never be the case I would do it in three steps. First add a space to the beginning of every line. Then replace the s's. Then replace ´\|?\s´ with nothing. – Taemyr Oct 22 '13 at 17:38