Using C# to efficiently pull data from a webpage with changing sourcecode?

Question

I have already put together code using the System.net.Webclient class to pull source code from a webpage, which I then use a string search on, to get specific information. This in itself works fine, but my issue is that the source code changes every few seconds, and I would like the data I have received to change accordingly. I understand that I could simply set up a loop to have this process repeat, but unfortunately my current code take a full 2.7 seconds to complete, and I would like to avoid this large lag time. In addition, I want to avoid spamming the webpage with requests if possible. I was thinking about a streamread that stays open, so that multiple requests wouldn't have to be sent, but I wasn't entirely sure how to go about this...

So to sum it up, is there a way that I can pull updating information from a website using the System.Net namespace in a manner that is both fast, and avoids spamming the website with requests?

@jay kreeler I think that the best you can do is issue an HEAD request every few minutes and check the last-modified header to see if the data changed. But I might be wrong. — formatc, Jun 15 '12 at 18:24
@AustinSalonen: Apparently they are some very productive programmers. — Ed S., Jun 15 '12 at 18:28
@AustinSalonen It is updating pricing on virtual items, and I would like to keep track of both average and current lowest prices so that I can analyze the data for future use — Ari, Jun 15 '12 at 18:28
I have no idea whose site you're pulling from, but there is a chance they have an API or some other means to do this. — Jon B, Jun 15 '12 at 18:30
To clarify, when I say that the page source is changing, I don't mean the coding itself, but specific numbers on the page. — Ari, Jun 15 '12 at 18:30
Unless your page supports Comet, which is unlikely, or there is an API exposed to the internet, you are out of luck — Filip, Jun 15 '12 at 18:47
If you want to avoid repeated requests of any kind while keep refreshing the data at your site, you should turn to http://physics.stackexchange.com asking about quantum physics-based information transfer. — Ondrej Tucny, Dec 26 '15 at 16:55

score 1 · Accepted Answer · edited May 23 '17 at 12:15

I am afraid that HTTP protocol is not adapted to your real-time data refresh requirement. Other than polling with HTTP requests at regular intervals you cannot know whether the data changed on the server and get this fresh data.

For example the WebSocket technology is more adapted to those scenarios. Of course the data provider must implement it so that clients could subscribe to this live feed.

There's also another way to implement this feature over the HTTP protocol. It uses an iframe to implement long polling. Here's an example. The idea is that the server uses chunked transfer encoding and sends continuous streams of data to the socket. The client subscribes to this stream and is able to be notified of changes occurring on the server. Once again, it's a technology that must be implemented by the server side so that you, as a client, could take advantage of it.

If all that the server provides is data via HTML page you are doomed to do screen scraping by hammering this server with HTTP requests until your IP address gets black listed and denied access.

Thank you very much; this answer (although a bit disappointing) answers my question fully. Much appreciated~ — Ari, Jun 15 '12 at 18:59

Using C# to efficiently pull data from a webpage with changing sourcecode?

1 Answers1