0

I am pretty new to Ruby on Rails and have been studying it using the Ruby on Rails Tutorial by Michael Hartl.

I am now working on my own project, which allows users to log in the website, provide personal biometric information and upload a csv file of their choice(workout data) to populate the database with the workout information.

I sought help from other friends with more experience and their advice was to create a staging table and use the staging table to populate the other tables(I currently have eight different tables for workout measurements).

I did quite a bit of research on staging table usage online, but couldn't find a solid answer to how to effectively use a staging table to import a csv file into multiple models.

From my understanding of staging tables, I should reset the staging table every time I(the user) is done uploading and importing the csv file into the database, but could not find anything online on whether it is the right practice or not.

Is this the right approach to using staging tables? The only other option that I can think of is creating and dropping a staging table every time the user uploads a file, but that seems too costly for it to be correct.

Thanks!

codeinstyle
  • 63
  • 1
  • 1
  • 4

1 Answers1

0

A "staging table" is simply an intermediate table which will have the field types in the same format as the expected CSV. When a user uploads a CSV file you can read the CSV row wise and populate this table. Having a staging server has the advantage that any expensive processing on the data prior to populating the actual domain tables can be done in background. Two approaches for doing that are described below:

  1. Trigger background processing after saving a data set to staging table.

    Once the data has been uploaded to staging server, you can trigger a background job to process the data and populate the models asynchronusly in the backend. I'd recommend the library sidekiq for this purpose. Many other alternative are available in the Ruby Toolbox

  2. Cron jobs

    Using this approach you have a function that periodically checks the staging table and then fills in the data populated so far to the relevant target tables. A suitable ruby library for this is the whenever gem.

You need not process the staging table in one go and dropping the staging table after operations is certainly not recommended. What would happen if someone would try to upload data to staging table while it was being dropped. Client server systems should be designed in a way that it can be used multiple users concurrently. A good strategy is to lazily process the data in staging table one row at a time - and the rows can be deleted post processing.

Also for simpler use case (a single save - process - discard sequence) you can simply save the CSV on disk and process it in background through the strategies mentioned above eliminating the need for staging table. Staging table is specially useful if you plan to populate multiple data stores (probably spread across geographic boundaries) and/or perform processing through several workers crunching the data concurrently.

Community
  • 1
  • 1
lorefnon
  • 12,875
  • 6
  • 61
  • 93