3

I am new to screen scraping. When i use proxy server and when i track the HTTP transactions, i am getting my post datas revealed to me. So my doubt/problem here is, 1)Will it get stored in the server side or it will be revealed only to the client side? 2)Do we have an option of encrypting the post data in screen scraping? 3)Is it advisable to use screen scraping for banking applications? I am using screen scraper tool which i have downloaded it from http://www.screen-scraper.com/download/choose_version.php. (Enterprise version)

Thanks in advance.

Preethi
  • 371
  • 2
  • 6
  • 20

3 Answers3

2

My experience with scraping is that if you aren't doing anything super complex (like logging into a secure website like an online banking website, etc.) then Python has some great libraries that will help you out a lot.

To answer your questions:

1) You may need to be more clear, but this really depends on your server/client architecture.

2) As a matter of fact you do. Urllib and Urllib2 (built-in Python libraries) both have functions that enable you to encrypt data before you make a POST. As far as how secure this encryption is, for most applications, this will suffice.

3) I actually have done scraping on online banking sites! I'm not exactly familiar with that tool, but I would recommend using something a little different than a scraper. Selenium, which is a "web-driver", allows you to simulate the use of a browser, meaning anything that the broswer does in the background in order to validate the session is automatically taken care of. The main problem I ran into while trying to scrape the banking site was the loss of important session data.

Selenium - https://pypi.python.org/pypi/selenium

Other libraries you may find useful are: urllib, urllib2, and Mechanize

I hope I was somewhat helpful!

  • 1
    Here is a link to a question I had that may help you more: http://stackoverflow.com/questions/15605408/logging-into-website-with-multiple-pages-using-python-urllib2-and-cookielib – Shawn Carmichael Jul 01 '13 at 20:16
1

1) What do you mean by server side? Your proxy server or screen-scraper software? Any of them can read/store your information.

2) If you are connecting through HTTPS then your software should warn you about malicious proxy server: https://security.stackexchange.com/questions/8145/does-https-prevent-man-in-the-middle-attacks-by-proxy-server

3) I don't think they have some logger which they can read. But if you are concerned you can try to write your own. There are some APIs which you can read HTML easily with jQuery sintax: https://pypi.python.org/pypi/pyquery or XPath: http://net.tutsplus.com/tutorials/javascript-ajax/web-scraping-with-node-js/

Community
  • 1
  • 1
Akira Yamamoto
  • 4,685
  • 4
  • 42
  • 43
1

I've used screen-scraper to scrape banking sites before. It will impact the site just like your browser--if the site uses encryption the connection from screen-scraper to the site will be too.

If you have a client page sending data to screen-scraper, you probably should encrypt that. I generally just make the connection via SSH.

Jason Bellows
  • 339
  • 2
  • 7