5

I come to you with a problem that has me stumped. I'm attempting to find the number of lines in a file (in this case, the html of a certain site) longer than x (which, in this case, is 80).

For example: google.com has (by checking with wc -l) has 7 lines, two of which are longer than 80 (checking with awk '{print NF}'). I'm trying to find a way to check how many lines are longer than 80, and then outputting that number.

My command so far looks like this: wget -qO - google.com | awk '{print NF}' | sort -g

I was thinking of just counting which lines have values larger than 80, but I can't figure out the syntax for that. Perhaps 'awk'? Maybe I'm going about this the clumsiest way possible and have hit a wall for a reason.

Thanks for the help!

Edit: The unit of measurement are characters. The command should be able to find the number of lines with more than 80 characters in them.

Doestovsky
  • 65
  • 1
  • 3
  • 8
  • Do you mean `80` characters or `80` fields? `This is a test` has `15` characters and `4` fields. – Jotne Nov 20 '14 at 08:28

3 Answers3

5

If you want the number of lines that are longer than 80 characters (your question is missing the units), grep is a good candidate:

grep -c '.\{80\}'

So:

wget -qO - google.com | grep -c '.\{80\}'

outputs 6.

gniourf_gniourf
  • 44,650
  • 9
  • 93
  • 104
  • 1
    You're right, I totally forgot to mention the units that I was trying to account for (I seem to forget one crucial piece of information when asking a question, no matter how careful I try). With that being said, `grep -c` worked like a charm. I was trying to do some brace expansion with `grep`. That didn't work out well. Thanks for the concise and efficient answer! – Doestovsky Nov 19 '14 at 21:54
  • 1
    If I'm not mistaken, `'.\{80\}'` matches lines with 80 or more characters, so here it should be `'.\{81\}'`. – Ana Borges Jan 03 '22 at 15:51
  • @ana-borges: you're right! (more in English is in the strict sense) – gniourf_gniourf Jan 03 '22 at 18:02
3

Blue Moon's answer (in its original version) will print the number of fields, not the length of the line. Since the default field separator in awk is ' ' (space) you will get a word count, not the length of the line.

Try this:

wget -q0 - google.com | awk '{ if (length($0) > 80) count++; } END{print count}'
Community
  • 1
  • 1
2

Using awk:

wget -qO - google.com | awk 'NF>80{count++} END{print count}'

This gives 2 as output as there are two lines with more than 80 fields.

If you mean number of characters (I presumed fields based on what you have in the question) then:

wget -qO - google.com | awk 'length($0)>80{c++} END{print c}'

which gives 6.

P.P
  • 117,907
  • 20
  • 175
  • 238
  • Thanks, this worked perfectly as well. I did want to count the _characters_ rather than the _fields_, so thanks to @philbrooksjazz for catching that. I picked gniourf's answer over yours because `grep` manages to accomplish the same thing a bit more concisely for my purposes. Thanks! – Doestovsky Nov 19 '14 at 22:06