0

I have two main data sources I need to do a mashup with in php. The first one is an SQL database and the other is a feed. Both sources contain similar items that need to be sorted by "name" or "weight". There are more than 1k items in each of the sources. Simply storing the feed in the db is not an option, as the feed changes far too often.

What I want to achieve is an accurate paginated output of the two. For example: if I display page two of the results containing 50 items sorted by "name" I don't read all data from feed and database but only the necessary items from both sources.

I currently assume this requires the database data to be fundamental data and to somehow merge the feed data into it, but: 1) I may be wrong, 2) I cannot seem to get my head around this no matter how hard I seem to try.

Any suggestions on how to achieve this?

P.S. The results do not have to be completely accurate. They only have to seem like it.

Lukas
  • 409
  • 1
  • 5
  • 15

1 Answers1

1

Consider caching feed results once in a while. Then you can figure approximate position of items you need.

For your example of user requesting second page, using your cache you figure that (at the moment of caching) accurate results would be items #20-50 from database and items #30-50 from feed. Since feed changes often, you can't just read same items and show them to user. Instead, you can read extra items, for example 10 both sides. For above, it would be items #10-60 from database and #20-60 from feed. Using extra items and finding matches with cache, you will be able to provide more accurate results, then you would by requesting same #20-50 and #30-50.

This way results won't be completely accurate, as you allow in "P.S.". But the more often users do requests, the more accurate the answer will be, since every request will update the cache a little bit. You will have to implement cache expiration tracking also, in case some pages were not requested by users for a long time and therefore are completely outdated.

Possible implementation of the above would be an extra table with a row for every "page"-"sort type" combination. In your example, for 50 items per page and 2000 items total, there will be 40 rows for "sort by name" and 40 rows for "sort by weight". Each row consists information about which items from both sources were taken last time that particular page was accessed, and that information's last update timestamp.

P.S. You don't post any details, which make it hard to come out with the more specific answer, but I hope what I wrote will help!

Ranty
  • 3,333
  • 3
  • 22
  • 24