4

I am looking to do some benchmarking on Full Text Search indexes in PostgreSQL, SQLServer and Lucene.

Any ideas on where to find a good big sample database to perform queries against?

Thanks a lot in advance.

Pablo Santa Cruz
  • 176,835
  • 32
  • 241
  • 292

1 Answers1

3

I think the great source would be wikipedia's database dump, since they contains really great amount of text. They are available here: http://dumps.wikimedia.org/

You could also try usenet archive, but there's harder to pick target language and the quality of language used is also lower.

Danubian Sailor
  • 1
  • 38
  • 145
  • 223