I need to store trillion of list of URLs where each URL list will contain ~50 URLs. What would be the most space efficient way to compress them for on-disk storage.
I was thinking of first removing useless information like "http://" and then build a minimal finite state automaton and save this.
An other option is to build a string of comma separated URL and compress this string using regular compression such as GZIP or BZ2.
If I don't care about speed which solution would result in the best compression.