Linux Terminal: Finding number of lines longer than x

Question

I come to you with a problem that has me stumped. I'm attempting to find the number of lines in a file (in this case, the html of a certain site) longer than x (which, in this case, is 80).

For example: google.com has (by checking with wc -l) has 7 lines, two of which are longer than 80 (checking with awk '{print NF}'). I'm trying to find a way to check how many lines are longer than 80, and then outputting that number.

My command so far looks like this: wget -qO - google.com | awk '{print NF}' | sort -g

I was thinking of just counting which lines have values larger than 80, but I can't figure out the syntax for that. Perhaps 'awk'? Maybe I'm going about this the clumsiest way possible and have hit a wall for a reason.

Thanks for the help!

Edit: The unit of measurement are characters. The command should be able to find the number of lines with more than 80 characters in them.

Do you mean `80` characters or `80` fields? `This is a test` has `15` characters and `4` fields. — Jotne, Nov 20 '14 at 08:28

score 5 · Accepted Answer · answered Nov 19 '14 at 20:25

5

If you want the number of lines that are longer than 80 characters (your question is missing the units), grep is a good candidate:

grep -c '.\{80\}'

So:

wget -qO - google.com | grep -c '.\{80\}'

outputs 6.

answered Nov 19 '14 at 20:25

gniourf_gniourf

44,650
9
93
104

1

You're right, I totally forgot to mention the units that I was trying to account for (I seem to forget one crucial piece of information when asking a question, no matter how careful I try). With that being said, `grep -c` worked like a charm. I was trying to do some brace expansion with `grep`. That didn't work out well. Thanks for the concise and efficient answer! – Doestovsky Nov 19 '14 at 21:54
1

If I'm not mistaken, `'.\{80\}'` matches lines with 80 or more characters, so here it should be `'.\{81\}'`. – Ana Borges Jan 03 '22 at 15:51
@ana-borges: you're right! (more in English is in the strict sense) – gniourf_gniourf Jan 03 '22 at 18:02

score 3 · Answer 2 · edited May 23 '17 at 12:01

3

Blue Moon's answer (in its original version) will print the number of fields, not the length of the line. Since the default field separator in awk is ' ' (space) you will get a word count, not the length of the line.

Try this:

wget -q0 - google.com | awk '{ if (length($0) > 80) count++; } END{print count}'

edited May 23 '17 at 12:01

Community

1
1

answered Nov 19 '14 at 20:52

philbrooksjazz

127
5

P.P · Answer 3 · 2014-11-19T20:36:17.893

2

Using awk:

wget -qO - google.com | awk 'NF>80{count++} END{print count}'

This gives 2 as output as there are two lines with more than 80 fields.

If you mean number of characters (I presumed fields based on what you have in the question) then:

wget -qO - google.com | awk 'length($0)>80{c++} END{print c}'

which gives 6.

edited Nov 19 '14 at 20:36

answered Nov 19 '14 at 20:15

P.P

117,907
20
175
238

Thanks, this worked perfectly as well. I did want to count the _characters_ rather than the _fields_, so thanks to @philbrooksjazz for catching that. I picked gniourf's answer over yours because `grep` manages to accomplish the same thing a bit more concisely for my purposes. Thanks! – Doestovsky Nov 19 '14 at 22:06

Linux Terminal: Finding number of lines longer than x

3 Answers3