0

In the company I work for we run a comparison website. Our "Products" are the services we compare from both internal and external sources.

The problem I have is that we have a backend CMS-style management system where managers and product administrators can add, remove and modify products.

When a new third party company come along and want to be part of our service we basically scrap their api for all their products and save them in our database, delegating only pricing and service availability to their api for real-time figures.

The benefit of this (and the sole reason why we have done it this way) is to allow our product management team to explicitly control the product's commission settings and availability (I.E: we can switch it off and prevent it from showing through our api and webservices / sites).

An obvious con to this is that if new products on the api become available or even if the products we expect change we have more points of failure to cover, however, the major problem I am having (and the reason for this post) is that we have a few new integrations coming on board with lots of products and to enter them all into our system is simply unfeasible.

My question is how have other people dealt with this kind of product catalogue integration scenario?

Thanks, G

Gary Doublé
  • 436
  • 4
  • 17
  • could you be more specific? why it's infeasible? your application indexes the data from N clients and displays them. what exactly is the problem? – piotrek Nov 10 '15 at 13:29
  • The problem is that their data doesn't fit with ours and often their search / purchase processes differ from that which we have already built. Currently the only way to integrate these external catalogues is to literally scrape their apis for every possible combination of products / services and dump that in our database. This is troublesome because that their data doesn't always map easily. I'm wondering if there is a pattern or architectural style that allows me to use their apis in real-time and still maintain control over what is available? – Gary Doublé Nov 11 '15 at 14:57

1 Answers1

0

if i understand you correctly, you have 2 different options plus a few hybrids.

api based solution: you assume all yours partners will have roughly similar api (product, quantity, features, price etc). then for each partner you do a converter/adapter/anti-corruption layer to import their objects into your model. sometimes your partner may be required to some work on their side. that may be the easiest but also dangerous way because you may encounter partners with completely different and non-convertable models. for example some auction portals don't have the concept of 'item' (there is only auction/description). there may not be anything like 'quantity' but only 'availability in the partner's store'. also the price may not be fixed but depends on current auction status or popularity (plane tickets). so when you encounter such model there might be no way to plug it into your system.

crawler-like solution: you almost completely ignore partner's api. instead you just scrap their websites and offer your clients full text search. this way you don't have any compatibility problems but you also don't have structured data (price)

hybrids: you may do the crawler way and use any existing api to get any required structured information (price etc). instead of api you may use machine learning methods to retrieve the required information from the scrapped data.

piotrek
  • 13,982
  • 13
  • 79
  • 165