I'd like to build a webapp to help other students at my university create their schedules. To do that I need to crawl the master schedules (one huge html page) as well as a link to a detailed description for each course into a database, preferably in python. Also, I need to log in to access the data.
- How would that work?
- What tools/libraries can/should I use?
- Are there good tutorials on that?
- How do I best deal with binary data (e.g. pretty pdf)?
- Are there already good solutions for that?