4

I was asked to make a software that will encrypt and decrypt a "normal English" text based on letter frequencies.

The question is where do I find some text samples where the official frequencies will match?

So far, I have tried "War and Peace" by Lev Tolstoy, it didn't work well..

LE: I don't need just a list of words, I need a text sample to make some processing.
LE2: The goal is to guess 20 from 26 in a 2000 characters long text.

sdadffdfd
  • 673
  • 1
  • 8
  • 24

3 Answers3

2

Check out infochimps; they have a bunch of freely available datasets that may be useful.

Noon Silk
  • 54,084
  • 6
  • 88
  • 105
2

You're searching for English text corpora, e.g. http://faculty.washington.edu/ebender/corpora/corpora.html#modern. Out of what's listed there, I know that Project Gutenberg is free; many of the others might not be.

I'm not sure what you mean by the official frequencies -- the point of the frequencies is to match what you find in the wild, and if they don't, that's the frequency table's problem.

Darius Bacon
  • 14,921
  • 5
  • 53
  • 53
1

Try this list of English words:

http://www.openbsd.org/cgi-bin/cvsweb/src/share/dict/