0

Let's imagine I have a text file with records taken from different sources. The file looks like this:

1000 Once upon a time, happy end.
1001 Tornado in NY city, the statue was finally found.
1002 I bought her an iphone 
yes 
for $1000. And then

happy end.
1003 How many times 
have I seen it?
not many. Actually.
1004 5 Cars. 2 Toys. 3 Birds.

Each row starts with \n and a row number like {1000...2000}. The row number is separated from text with a tab \t.

So how do I count the occurrence of "." with sed in one record?

Can sed substitute all chars except the ones that are given in a pattern without grouping them in into [^...]?

The output should look like this:

1000 1
1001 1
1002 2
1003 2
1004 3
minerals
  • 1,195
  • 4
  • 15
  • 22

1 Answers1

3

Here's one method:

$ awk -v r=1000 '{print r++,split($0,a,".")-1}' RS="\n[0-9]+\t" file 
1000 1
1001 1
1002 2
1003 2
1004 3
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
  • this shall count all the dots in a file, I need to count the dots within one record only. Take a notice why I assign RS in my query. – minerals Apr 10 '13 at 14:34
  • @minerals adding the expected output is always a good idea, I don't know where the tabs are in your file so I couldn't test. Please a the output `cat -t file` to your question so I know where the tabs are. – Chris Seymour Apr 10 '13 at 14:42
  • @minerals your question is much clearer now, see edit, should do the trick. – Chris Seymour Apr 10 '13 at 14:53
  • 1
    excellent, your solution worked correctly, many thanks. As I see, the only sane way to do it, is with awk. Sed is no good – minerals Apr 10 '13 at 14:54
  • @minerals yes `awk` is definitely the tools for this. – Chris Seymour Apr 10 '13 at 15:02