Presumably you're doing this for the purpose of saving on network traffic overhead between the MySQL server and the machine on which your application is running. As it happens, you're not saving any other sort of workload on the MySQL server. It has to fetch the LONGTEXT item from disk, then run it through SUBSTRING
.
Presumably you've already decided based on solid performance analysis that you must save this network traffic. You might want to revisit this analysis now that you know it doesn't save much MySQL server workload. Your savings will be marginal, unless you have zillions of very long LONGTEXT items and lots of traffic to retrieve and display parts of them.
In other words, this is an optimization task. YAGNI? http://en.wikipedia.org/wiki/YAGNI
If you do need it you are going to have to create software to process the LONGTEXT item word by word. Your best bet is to do this in your client software. Start by retrieving the first page plus a k or two of the article. Then, parse the text looking for complete words. After you find the last complete word in the first page and its following whitespace, then that character position is the starting place for the next page.
This kind of task is a huge pain in the neck in a MySQL stored procedure. Plus, when you do it in a stored procedure you're going to use processing cycles on a shared and hard-to-scale-up resource (the MySQL server machine) rather than on a cloneable client machine.
I know I didn't give you clean code to just do what you ask. But it's not obviously a good idea to do what you're suggesting.
Edit:
An observation: A gigabyte of server RAM costs roughly USD20. A caching system like memcached does a good job of exploiting USD100 worth of memory efficiently. That's plenty for the use case you have described.
Another observation: many companies who serve large-scale documents use file systems rather than DBMSs to store them. File systems can be shared or replicated very easily among content servers, and files can be random-accessed trivially without any overhead.
It's a bit innovative to store whole books in single BLOBs or CLOBs. If you can break up the books by some kind of segment -- page? chapter? thousand-word chunk? -- and create separate data rows for each segment, your DBMS will scale up MUCH MUCH better than what you have described.
If you're going to do it anyway, here's what you do:
always retrieve 100 characters more than you need in each segment. For example, when you need characters 30000 - 35000, retrieve 30000 - 35100.
after you retrieve the segment, look for the first word break in the data (except on the very first segment) and display starting from that word.
similarly, find the very first word break in the 100 extra bytes, and display up to that word break.
So your fetched data might be 30000 - 35100 and your displayed data might be 30013 - 35048, but it would be whole words.
– user1690753 Sep 22 '12 at 11:23
` - what would your query do if it encounters a `
` near the 1000th character
or or or . any one of these if reached, that would be the end of this page. of course after at lease 1000 characters. So the query should be something like this: "give me at least 1000 characters starting from position x until you reach
or or ". That's a hell of a query ;) before I forget, the position x is the beginning of the end of the last query !!!! it's so complicated :( – user1690753 Sep 22 '12 at 15:19