I ran a few "real-world" tests with 4,292,907 different strings. I used a 11-char long substring of the HEXed hash (in other words, a 44-bit portion). Example:
HASH: 629a09633488e9b2aaf2f5a606706da3
Test 1: 629a0963348
Test 2: 29a09633488
Test 3: 9a09633488e
...
Theoretically, I calculated the probability of collision to be ~41% (based on the "birthday paradox probability" formula). But that was theory, which assumes real random distribution. So I wanted to empirically test both MD5 and SHA-1 and see the results. Here they are (the numbers on the right show the number of collisions):
[MD5] [SHA-1]
Test No 1: 2 Test No 1: 0
Test No 2: 0 Test No 2: 0
Test No 3: 1 Test No 3: 0
Test No 4: 0 Test No 4: 1
Test No 5: 0 Test No 5: 0
Test No 6: 0 Test No 6: 1
Test No 7: 1 Test No 7: 0
Test No 8: 2 Test No 8: 0
Test No 9: 1 Test No 9: 0
Test No 10: 1 Test No 10: 0
Test No 11: 0 Test No 11: 1
Test No 12: 0 Test No 12: 1
Test No 13: 0 Test No 13: 0
Test No 14: 0 Test No 14: 1
Test No 15: 0 Test No 15: 1
Test No 16: 0 Test No 16: 1
Test No 17: 1 Test No 17: 1
Test No 18: 1 Test No 18: 1
Test No 19: 0 Test No 19: 0
Test No 20: 0 Test No 20: 1
TOTAL: 8 TOTAL: 10 // No of tests with at least 1 collision
Conclusion: Neither MD5 nor SHA-1 showed significantly worse probability of collision, compared to the "theoretical" one calculated via the "birthday paradox probability" formula. I am aware this test isn't perfect and should be taken with a grain of salt, but to me, at least, it shows that I can heavily rely on calculating the chance of collision via the "theoretical" formula w/o worrying that my calculations are too far from the truth.