2

I'm developing a webapp that will need to download the html form a website and then iterate through the code and try to find a specific but ever changing value (in our case it will be the price for the product).

For this, I was thinking about asking the user (upon installation and setup) to provide the system with a few lines of html from the page (that has the price) and then from then on, every time we need to fetch the price we would try to search for those lines and find the price.

Now, I believe this is a horrible and slow way of doing this and since there are no rules and the html can be totally different from one website to another (even the same website might change) I couldn't find a better way.

One improvement that I thought about was to iterate through the first time and record the line at which we find the code. Once found, the subsequent times we would then start from a few lines before the expected location and start the search. Any Thoughts on how I can improve on this?

I posted this question on https://cstheory.stackexchange.com/ but they commented that it's not on topic and that I should post it here.

I have the code for the above and if needed I can post it, I'm simply thinking that there must be a better, faster way of doing this.

Community
  • 1
  • 1
hjavaher
  • 2,589
  • 3
  • 30
  • 52
  • I know the vale is changing but is there any clue that points to the correct value that is predictable? Like a jQuery style selector that could resolve (or atleast narrow it down?) – Jason Sperske Feb 16 '14 at 22:45
  • @JasonSperske unfortunately there are no guarantees, that's the main issue. The only constant is that the value is a price. but even with that, the html markup might be dramatically different (not to mention the currency symbol and currently format of the specific country) – hjavaher Feb 16 '14 at 23:14
  • Maybe you can add some examples of the markup you are attempting to parse? – Jason Sperske Feb 17 '14 at 00:59
  • @jasonsperske I'm not at the office and on my phone. I'll be in the office in a few hours and will update the question with a few examples – hjavaher Feb 17 '14 at 01:32
  • @JasonSperske This morning I woke up to update the question with the snippet of code you asked for and it clicked what you were saying!! I'm sorry I was probably out of it after a long day. the best way simply is to use jquery selectors. I can probably come up with a very precise selector for ever website and go from there. would you mind putting it in a form of an answer so I can give you credit for it (not that you need it haha but still)? – hjavaher Feb 17 '14 at 16:27
  • 1
    If you arrive at a solution you can try and post it on CodeReview (another Stack Exchange site) and see if anyone can help clean up any edge cases or improve performance. Good luck :) – Jason Sperske Feb 17 '14 at 17:55

1 Answers1

1

This is actually something I tried for a project recently (using BeautifulSoup and Python). The solution that worked for me was to workout CSS selectors (which can map to jQuery selectors) that targeted the elements that contained the values I was looking for. In my case I was able to narrow down the full document to just the elements that contained what I was looking for but if you couldn't get exactly what you where after you could combine this with some extra lactic like test to see if it looks like a price (via regex) or test what it is next to.

Jason Sperske
  • 29,816
  • 8
  • 73
  • 124
  • Yes, I honestly don't know how I missed that. This is the simplest way of doing this. Although each website is different, seldom does the same website changes it's structure and by using css selectors, we're making the browser do the work instead of the server (which was how I was planning on doing it. Well done sir, Thank you! – hjavaher Feb 17 '14 at 18:12