I'd like to test 3rd party (including "closed source") tools (like synchronization, de-duplication...) behaves in presence of files with same size and digest checksum (popular ones CRC32, MD5,SHA-1 ... etc). Some of those hashing methods have known vulnerabilities, so there exist ways of generating collisions.
Do you know about source of such datasets (other then brute-force try to create some :) ) or generators for creating such ?
To make clear about this : I am interested in sets of files with same checksum, file-size but different contents !