I can obtain listing for Common Crawl by:
https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2017-09/wet.paths.gz
How can I do this with Common Crawl News Dataset ?
I tried different options, but always getting errors:
https://commoncrawl.s3.amazonaws.com/crawl-data/CC-NEWS-2017-09/warc.paths.gz
https://commoncrawl.s3.amazonaws.com/crawl-data/CC-NEWS/2017/09/warc.paths.gz