1

I recently started running a text-only usenet server. The way that I have storage working is that each news article (post) is in the following kind of direction.

/var/spool/news/articles/alt/i/love/cat/1 2 3

The numbers are the individual articles. The server has been running for about a few hours and I already have 150M+ of articles. (so, the usenet is obviously not as dead as expected). I would like to start harvesting spammers so I can add them to my spam filter proactively. One common aspect of spammers is that they tend to post to lots of newsgroups at once. Each message header contains the following line:

newsgroups: a,b,c  

Each letter is representing another newsgroups. I would like to find a way to run a daily report to tell me which articles have >3 commas in the newsgroup line of the header. Here's what I've come up with so far:

find /var/spool/news/articles/ -name "*" | grep '[0-9]' > list.txt &&
while read i; do echo $i; grep Newsgroups $i | grep -c [\,]; done <list.txt

The find command will create a list of every message minus any directories. The while-loop will display the message path and on a separate line it will display the number of commas.

Ideally I would like to have the output be something like this:

/var/spool/news/articles/alt/i/love/cat/1 2

I could then put the output into a spreadsheet and sort by the number of commas, but that carriage return between the path and the number is messing me up. Sorry for the long post, but I figured I would explain what I'm doing in case someone else tries to do something like this in the future.

I would also appreciate it if anyone has any suggestions for doing this in a more "sane" manner than I have.

elmerjfudd
  • 113
  • 5
  • Isn't there already existing software for managing Usenet servers that automates these types of checks? – Barmar Oct 09 '20 at 16:24
  • 1
    `grep 'Newsgroups: .*,.*,.*,.*,'` will match lines that have at least 4 commas. – Barmar Oct 09 '20 at 16:25
  • 2
    there are several ways to get the 2 items on one line, eg: `printf "${i}"` (no trailing `\n` so cursor stays on current line) then `grep ...| grep ...` output will go on the end of the 'current' line; `echo "${i} $(grep ... | grep ...)"`; `gcount=$(grep ... | grep ...); echo "${i} ${gcount}"`; and on and on and on ... – markp-fuso Oct 09 '20 at 16:30
  • 1
    The `printf "${i}"` is exactly what I needed I replaced the echo command with `printf "${i}"; printf ","` so I can output everything into a csv format. – elmerjfudd Oct 09 '20 at 16:46

0 Answers0