I'm trying to download Google's phishing and malware list from their safe browsing API. I want to use the new V3 API.
I was managed to get the redirect URL that makes the list. Here is the response i get:
n:1710
i:googpub-phish-shavar
u:safebrowsing-cache.google.com/safebrowsing/rd/ChRnb29ncHViLXBoaXNoLXNoYXZhcjgBQAJKDAgBEOeYARjqmAEgAUoMCAEQu5gBGOWYASABSgwIARDplwEYuZgBIAFKDAgBENmXARjnlwEgAUoMCAEQxpcBGNeXASABSgwIARDDlwEYxJcBIAFKDAgBELCXARjBlwEgAUoMCAEQgpcBGK6XASABSgwIARD-lgEYgJcBIAFKDAgBEOGWARj8lgEgAUoMCAEQ2pYBGN-WASABSgwIARDQlgEY2JYBIAFKDAgBEMKWARjOlgEgAUoMCAEQvZYBGMCWASABSgwIARC6lgEYu5YBIAFKDAgBELWWARi4lgEgAUoMCAEQsJYBGLOWASABSgwIARCmlgEYrpYBIAFKDAgBEJ6WARihlgEgAUoMCAEQm5YBGJyWASABSgwIARCWlgEYmZYBIAFKDAgBEJOWARiUlgEgAUoMCAEQjZYBGJGWASABSgwIARD-lQEYi5YBIAFKDAgBEPeVARj7lQEgAUoMCAEQ9JUBGPSVASABSgwIARDolQEY8JUBIAFKDAgBEOSVARjmlQEgAUoMCAEQ4JUBGOKVASABSgwIARDYlQEY3JUBIAFKDAgBENGVARjVlQEgAUoMCAEQzZUBGM-VASABSgwIARDIlQEYyZUBIAFKDAgBEMCVARjGlQEgAUoMCAEQvpUBGL6VASABSgwIARC7lQEYvJUBIAFKDAgBELiVARi4lQEgAUoMCAEQs5UBGLaVASABSgwIARCwlQEYsZUBIAFKDAgBEK6VARiulQEgAUoMCAEQqpUBGKyVASABSgwIARCmlQEYqJUBIAFKDAgBEKKVARiilQEgAUoMCAEQnZUBGJ2VASABSgwIARCWlQEYl5UBIAFKDAgBEJSVARiUlQEgAUoMCAEQj5UBGJCVASABSgwIARCNlQEYjZUBIAFKDAgBEIWVARiIlQEgAUoMCAEQgZUBGIOVASABSgwIARD7lAEY_5QBIAFKDAgBEPWUARj4lAEgAUoMCAEQ8JQBGPCUASAB
u:safebrowsing-cache.google.com/safebrowsing/rd/ChRnb29ncHViLXBoaXNoLXNoYXZhcjgBQAJKEAgAEIydExicqRMgASoC0QU
u:safebrowsing-cache.google.com/safebrowsing/rd/ChRnb29ncHViLXBoaXNoLXNoYXZhcjgBQAJKEAgAEMCOExiLnRMgASoC0gg
u:safebrowsing-cache.google.com/safebrowsing/rd/ChRnb29ncHViLXBoaXNoLXNoYXZhcjgBQAJKEAgAEKP-Ehi_jhMgASoC_A4
u:safebrowsing-cache.google.com/safebrowsing/rd/ChRnb29ncHViLXBoaXNoLXNoYXZhcjgBQAJKFAgAENjsEhii_hIgASoGsALLBeMF
u:safebrowsing-cache.google.com/safebrowsing/rd/ChRnb29ncHViLXBoaXNoLXNoYXZhcjgBQAJKIAgAEObcEhjX7BIgASoS0QOuBfQGgwfnCLAJsQmyCd0J
My problems are:
1. How do i save the list in to the DB? Does each row in the chunk file is only hashed or i need to deserialize it using Protocol Buffer?
2. How do i check if a given URL is bad? Do i need to hash it?
3. How do i need which chunks do i have?