How to find and print a line with a strict pattern is appropriate

Question

The problem is as follows. There is a file with a set of strings testfile:

string2 var var33
string2 HD loop 334
000:000:7878:7878:8978
string1 var var33    
string1 HD loop
000:000:7878:7878:8978
string3 var var33
string3 HD loop4343
000:000:7878:7878:8978

Need to find a line with a strict pattern is appropriate such as "HD loop" and then output the string found on line and after it. In other words, the result should look like this:

string1 var var33
string1 HD loop
000:000:7878:7878:8978

why `string1 var var33` is in the output? – arutaku Oct 08 '12 at 09:59 — arutaku, Oct 08 '12 at 09:59

Steve · Answer 1 · 2012-10-08T22:43:49.547

1

Perhaps you're looking for context and a max_count. Use GNU grep:

grep -C 1 -m 1 "HD loop" file.txt

If you're only looking to output the matching line, and the line after it: Change -C 1 to -A 1.

EDIT:

grep -P -C 1 "\bHD loop\b" file.txt

EDIT:

grep -C 1 "[^a-zA-Z0-9_]HD loop[^a-zA-Z0-9_]" file.txt

edited Oct 08 '12 at 22:43

answered Oct 08 '12 at 10:50

Steve

51,466
13
89
103

Looks good, but seeks only the first occurrence of the pattern. If this block is like second in a row, it will return only the first block. For example `string1 var var33 string1 HD loop 334 000:000:7878:7878:8978 string2 var var33 string2 HD loop 000:000:7878:7878:8978 string3 var var33 string3 HD loop4343 000:000:7878:7878:8978` The result will be: `string1 HD loop 334 000:000:7878:7878:8978 string2 var var33` Instead `string2 var var33 string2 HD loop 000:000:7878:7878:8978` – Alex Oct 08 '12 at 12:27
Simply remove the `-m 1` from the command above to find all possible matches. If I've misunderstood your request, please edit your question. Cheers. – Steve Oct 08 '12 at 12:32
Well I am a bit corrected the post I need to find the exact conform to the pattern. Absolutely accurate to: ie "HD loop" is not the "HD loops4343". – Alex Oct 08 '12 at 12:41
In busybox no option -P in sed. That why I can see result of this one. – Alex Oct 08 '12 at 13:24
@Alex: I'm assuming you mean your version of `grep` does not support the `-P` option. You can match word boundaries without using Perl regex; please see the edit I've made. However, using `[^a-zA-Z0-9_]` is not the same as `\b`, therefore, if there are other characters you'd like to make word boundaries, simply add them into the brackets after the underscores (`_`). HTH. – Steve Oct 08 '12 at 22:54

Gilles Quénot · Answer 2 · 2012-10-08T10:16:03.377

0

In awk in a shell :

awk '
    {
        arr[c++] = $0
    }
    END{
        for (a in arr) {
            if (arr[a] ~ "HD loop") {
                printf("%s\n%s\n%s\n", arr[a-1], arr[a], arr[a+1])
                exit
             }
        }
    }
' FILE

Another implementation without the need to fill too much RAM :

awk '
    {
        if ($0 ~ "HD loop") {
            print var
            print $0
            getline
            print
            exit
        }
        else{
            var=$0
        }
    }
' FILE

edited Oct 08 '12 at 10:16

answered Oct 08 '12 at 10:06

Gilles Quénot

173,512
41
224
223

The result is `string1 var var33 string1 HD loop 334 000:000:7878:7878:8978` This is not something that we should. – Alex Oct 08 '12 at 12:50

score 0 · Answer 3 · edited Oct 09 '12 at 03:27

0

Since you are asking for a sed program, here is one that gives the answer you seek

sed -n -f xfile.sed xfile.txt

where xfile.txt is your sample input file and xfile.sed is

H
/HD loop/{
  x
  p
  n
  p
  q
}

edited Oct 09 '12 at 03:27

Community

1
1

answered Oct 08 '12 at 10:23

Marichyasana

2,966
1
19
20

Here is the same result as in the code `grep -C 1 -m 1 "HD loop" file.txt` This is not working properly. – Alex Oct 08 '12 at 12:35

Janito Vaqueiro Ferreira Filho · Answer 4 · 2012-10-08T14:02:03.777

Using sed:

#!/bin/sed -nf

/HD loop$/ {
    x
    G
    N
    p
    s/.*\n\([^\n]*\)/\1/
}
h

When "HD loop" is found at the end of a line (indicated by the $ character), a command block is executed. This command block starts by swapping the contents of the hold space (an auxiliary buffer) with the contents of the pattern space (the working buffer), using the x exchange command. As we will see later, we will keep the hold space with the last line read. The G command will append the contents of the hold space (which now contains the current line) into the pattern space, and the N command will read the next line of the input and append it into the pattern space. We can then print the pattern space with the p command. The last thing to do is to restore the hold space. We do this using two commands. The first is a substitute command that removes all lines except the last one from pattern space. Then we copy the pattern space to hold space with the h command.

Even if the line doesn't match "HD loop", it is copied to hold space. By doing this, the hold space will always contain the contents of the previous line. Beware that because of the way we set up the hold space after a match is found, it doesn't properly recognize two matches that appear on successive lines. If you want to consider this, some special treatment is needed:

#!/bin/sed -nf

/HD loop$/ b next
h

:start
n

/HD loop$/ {
    x
    G
    :next
    N
    p
    s/.*\n\([^\n]*\)/\1/
    /HD loop$/ b next
    d
}
h
b start

For a more complete and general version, we must first consider what happens when the "HD loop" is found on the first line. In the previous version, it would print an empty line followed by the "HD loop" line. Because this can confuse the output into thinking that HD loop was actually preceded by an empty line we must use special treatment for this. The special treatment is to override sed's evaluation loop, using our own.

We define a start label with the : command, which defines the start of our loop. Then, at the end of the script, we use the b branch command to jump back to the start of the loop. To fully mimic sed's evaluation loop, the first command after the start label is the n next command, to read the next input line into the pattern space.

With our loop defined, we can treat the first special case, which is when the first line starts with HD loop. If it does, we have to skip loading the contents of the hold space, because we know it doesn't contain any useful data. Let's define a label next right after the G command to append the contents of the hold space. We can now use /HD loop/ b next to skip the hold space manipulation and just print the current line and the line that follows.

If the first line doesn't start with "HD loop", we must store it into hold space before n replaces it with another command. So we do that with the h command.

The next special case is when two "HD loop" lines appear following each other. In that case, at the end of the block in the previous version, we can check if the newly read line contains "HD loop", and if it does, we can simply jump back to the the next label in order to read another line and print it. We can do this as many times necessary, treating as many consecutive "HD loop" lines available.

The last special case is when two "HD loop" lines appear separated by a single line. If we leave it the way it is, this case will print the line between the "HD loop" lines twice. To treat this, we must act as if the hold space doesn't need to be printed if a "HD loop" line is found right after a match. Because this situation is analogous to what happens when we are looking into the first line of the input, we can use the d delete command right at the end of the match to clear the pattern space and restart the whole script. Now it behaves as if the line was the first input line, and won't print the hold space if the line after the match is an "HD loop" line.

UPDATE: If you only want the first result, you can simplify a few things:

#!/bin/sed -nf

/HD loop$/ b next
h

:start
n

/HD loop$/ {
    x
    G
    :next
    N
    p
    q
}
h
b start

Now, instead of performing all of the previous operations after printing the line, we can just quit with the q command.

Hope this helps =)

The result of you code `sed -n '/HD loop/{x;G;N;p;s/.*\n$[^\n]*$/\1/}h;'` is `string1 var var33 string1 HD loop 334 000:000:7878:7878:8978 string2 var var33 string2 HD loop 000:000:7878:7878:8978 string3 var var33 string3 HD loop4343 000:000:7878:7878:8978` This is not working properly. — Alex, Oct 08 '12 at 12:31
Hi. I'm not sure I understood what you wanted. I've updated the post adding `$` to the `HD loop` pattern, in case you want to search for lines that explicitly end with "HD loop", and I also added another example to find only the first match. — Janito Vaqueiro Ferreira Filho, Oct 08 '12 at 14:05

score 0 · Answer 5 · answered Oct 08 '12 at 14:01

awk '{for(i=1;i<NF;i++)if($i" "$(i+1)=="HD loop"){print x;print;getline;print}};{x=$0}' your_file

tested below:

> cat temp
    000:000:7878:7878:8978
    string1 var var33    
    string1 HD loop
    000:000:7878:7878:8978
    string3 var var33
    string3 HD loop4343
    000:000:7878:7878:8978

> awk '{for(i=1;i<NF;i++)if($i" "$(i+1)=="HD loop"){print x;print;getline;print}};{x=$0}' temp
    string1 var var33    
    string1 HD loop
    000:000:7878:7878:8978

score 0 · Answer 6 · answered Oct 08 '12 at 23:58

0

This might work for you (GNU sed):

sed '$!N;/HD loop$/!D;$!N;p;d' file

answered Oct 08 '12 at 23:58

potong

55,640
6
51
83

How to find and print a line with a strict pattern is appropriate

6 Answers6