I have several applications that create a unique (with high probability), human-readable checksum or digital signature by applying a cryptographic hash like MD5, then using the resulting bits with an arithmetic coder to select words from a list. I've simply been using /usr/share/dict/words
, but recently a client (rightly) complained about receiving a document whose checksum included offensive words or trigger words. (More details at my answer to Generate User Friendly Codes).
For this application, long lists are important, as they avoid repeats---the list I'm using has many tens of thousands of words.
Does anyone know either how to remove offensive and trigger words from such a list, or where to find a list of innocuous words?