1

I'm currently trying to build a web app that would allow many users to query an external API (I cannot retrieve all the data served by this API at regular intervals to populate my PostgreSQL database for various reasons). I've read several thing about ACID and MVCC but still, I'm not sure there won't be any problem if several users are populating/reading my PostgreSQL database at the very same time. So here I'm asking for advice (I'm very new to this field)!

Let's say my users query the external API to retrieve articles. They make their search via a form, the back end gets it, queries the api, populates the database, then query the database to return some data to the front end.

  • Would it be okay to simply create a unique table to store the articles returned by the API when users are querying it ?
  • Shall I rather store the articles returned by the API and associate each of them to the user that requested it (the Article model will contain a foreign key mapping to a User model)?
  • Or shall I give each user a table (data isolation would be good but that sounds very inefficient)?

Thanks for your help !

Lo Bellin
  • 485
  • 4
  • 20
  • What is the reason behind using the database and not just serving the API response directly? Performance, reliability, statistics, pagination, the user is saving the data in their profiles, ...? – Kevin Languasco Mar 25 '20 at 23:25
  • Eliminate the 3rd option; inefficient probably, a maintenance nightmare absolutely. Your need for the other options depends upon other processing requirements. If you need to create a local copy or need to track which articles have been viewed you will need the 1st option. Do you also need to track which users have viwed which articles and/or user comments about them. In this case it's probably it's probably best to create a 3rd table User_Article (or so). This table would contain FK to both Users And Articles table, and any data that is necessary to the combination - like comments. – Belayer Mar 25 '20 at 23:37
  • 1
    @KevinLanguasco Many metrics are associated with each article and there could be a lot of articles to retrieve. I need to calculate some KPIs before returning the articles (they are used to populate a kind of dashboard). Even if the api is queried from the user side with ajax, I still need to process things in the backend and that seems faster and easier using a database. Still, if you have some resources describing how to do that kind of things in Django, I would be glad to change my code! – Lo Bellin Mar 26 '20 at 08:00

1 Answers1

0

Would it be okay to simply create a unique table to store the articles returned by the API when users are querying it ?

Yes. If the articles have unique keys (doi?) you could use INSERT...ON CONFLICT DO NOTHING to handle the (presumably very rare) case that an article is requested by two people nearly simultaneously.

Shall I rather store the articles returned by the API and associate each of them to the user that requested it (the Article model will contain a foreign key mapping to a User model)?

Do you want to? Is there a reason to? Do you care who requested each article? It sounds like you anticipating storing only the first person to request each article, and not every request?

Or shall I give each user a table (data isolation would be good but that sounds very inefficient)?

Right, you would be hitting the API a lot more often (assuming some large fraction of articles are requested more than once) and storing a lot of duplicates. It might not even solve the problem, if one person hits "submit" twice in a row, or has multiple tabs open, or writes a bot to hit your service in parallel.

jjanes
  • 37,812
  • 5
  • 27
  • 34