0

From abuse.ch one can get a plain text file with malware distributing URIs. I want to use this as a blacklist for squid proxy (not yet sure about runtime behavior). It should not be to hard to convert the URI file into a regex file for acl aclname url_regex ... using sed, but I struggle to find the squid regex syntax description to identify all special characters, that I have to escape.

Thomas P
  • 51
  • 1
  • 9
  • the dquid page has a nice wiki, https://wiki.squid-cache.org/SquidFaq/SquidAcl – djdomi Mar 01 '22 at 17:46
  • I know this wiki page, but it describes the acl syntax, not the regex syntax. – Thomas P Mar 02 '22 at 07:06
  • You must be more specific and produce at least one example clearly stating what you intend to do. Anyway, assuming you just need to parse a hosts file [https://urlhaus.abuse.ch/downloads/hostfile/](https://urlhaus.abuse.ch/downloads/hostfile/), you may try this: search for `^(#.*$(\n|\r\n)?|127.*\t)` and replace with `""` – mjoao Jun 22 '22 at 10:11
  • I'm looking for a description of the regex syntax itself. Which metacharacters, quantifiers, modifiers, ... are allowed, This differs slightly from perl to php to java to ... – Thomas P Jun 24 '22 at 09:24

1 Answers1

1

Squid understands GNUregex (Extended Regular Expressions, AKA: ERE REGEXP).
It does not fully understand Perl Regular Expressions, AKA: PCRE.
E.x: \w, \d, \W, \D, lookahead, negative lookahead, shy grouping, atomic groups, etc...)

Working examples:

^(outlook-[1-9]\.cdn|attachments|res\.cdn)\.office\.net$
^c[0-9]+.*(powerpoint|word|excel|visio).*[0-9]{2}\.cdn\.office\.net$
^trello-[a-zA-Z0-9]+\.s3\.amazonaws\.com$

NON WORKING examples but PCRE valid:
^(outlook-\d\.cdn|attachments|res\.cdn)\.office\.net$
^c\d+.*(powerpoint|word|excel|visio).*\d{2}\.cdn\.office\.net$
^trello-\w+\.s3\.amazonaws\.com$
^rr?[1-9]-{2,4}sn-(?!.*-apn[a-z]).*\.googlevideo\.com)$

More info: https://www.gnu.org/software/gnulib/manual/html_node/Regular-expressions.html https://www.gnu.org/software/grep/manual/html_node/Regular-Expressions.html

mjoao
  • 171
  • 4