extract data from text files

Question

I have some text files as shown below. I would like to extract the numbers of second column only if the 4th column has 5 or more continuous numbers with >0.

file1.txt

68.2    408     68.2    0
33.4    409     30.3    3.1
12.6    410     7.5     5.1
90.7    411     55.0    35.7
25.2    412     12.1    13.1
55.9    413     .4      55.5
27.3    414     4.8     22.5
46.0    415     42.5    3.5
10.6    421     10.6    0
2.3     422     2.3     0

file2.txt

    72.2    63   62.4   9.8
    10.7    65    .0    10.7
    64.4    66   7.9    56.5
    40.8    67   .0     40.8
    16.0    68  15.0    1
    21.2    69  21.2    0
    31.5    70  2.6     28.9
    26.0    71  21.3    4.7
    112.1   72  74.9    37.2
    86.8    73  86.2    .6
    12.1    74  7.2     4.9

Desired output

*file1.txt
409
410
411
412
413
414
415
*file2.txt
63
65
66
67
68
*file2.txt
70
71
72
73
74

How could I achieve this? your suggestions would be appreciated!!

Is this a txt file? the columns are ordered in regular lines ? — user3165438, Jul 21 '14 at 07:56

konsolebox · Accepted Answer · 2014-07-21T08:21:25.447

3

With awk:

#!/usr/bin/awk -f
function print_all() {
    if (i >= 5) {
        print "*" FILENAME
        for (j = 1; j <= i; ++j)
            print a[j] 
    }
    i = 0
}
$4 > 0 {
    a[++i] = $2
    next
}
{
    print_all()
}
ENDFILE {
    print_all()
}

Example:

awk -f script.awk file1.txt file2.txt

Condensed version:

awk 'function print_all() { if (i >= 5) { print "*" FILENAME; for (j = 1; j <= i; ++j) print a[j] } i = 0 } $4 > 0 { a[++i] = $2; next } { print_all() } ENDFILE { print_all() }' file1.txt file2.txt

Output:

*file1.txt
409
410
411
412
413
414
415
*file2.txt
63
65
66
67
68
*file2.txt
70
71
72
73
74

edited Jul 21 '14 at 08:21

answered Jul 21 '14 at 08:14

konsolebox

72,135
12
99
105

What is the purpose of ENDFILE ? – Jul 21 '14 at 11:18
@Jidder It executes everytime a session to a file ends. – konsolebox Jul 21 '14 at 11:18
@konsolebox I meant in this case it doesn't seem to be needed ? – Jul 21 '14 at 11:37
@Jidder It is needed when a queue is not yet printed when EOF ends. – konsolebox Jul 21 '14 at 11:39
@konsolebox Ahh i had the files the other way round. Also ENDFILE doesn't work with my awk ? Any idea why ? `GNU Awk 3.1.8` – Jul 21 '14 at 11:42
@Jidder I just tested it with 3.1.8. It works for me. – konsolebox Jul 21 '14 at 11:49
`ENDFILE` was introduced in gawk 4.0, see http://www.gnu.org/software/gawk/manual/gawk.html#Feature-History – Ed Morton Jul 21 '14 at 14:54
Sorry about that. `3.1.8` indeed worked with the problem but it didn't need `ENDFILE`. – konsolebox Jul 21 '14 at 14:59

extract data from text files

1 Answers1