Staging environment keeping up to date with new prod data

Question

In most companies I've seen, the prod/staging dichotomy is articulated as follows, using Facebook's live feed as a running example:

prod (obviously) stays up to date with new incoming posts, messages, etc
staging environments are prod snapshots with their own databases, which are not updated with new prod data
when a user (in prod) creates a new post, the post appears in prod but not in any pre-existing staging environment

With such a setup, when a new feature needs to be tested "live" before going to prod, the only way to do so is to create a staging environment based on a snapshot of prod database, and test the new feature on it.

This happens to be quite detrimental when the features under test are features that heavily depend on, and interact with, a complex combination of newly constantly incoming data. Such features can be, changing the way new posts are shown into other users' feeds, for instance, or suggesting content based on a post that the user has just liked. Or it could also depend on external data flowing in, such as weather or articles from an external news agency, etc. Or it could be machine learning algorithms which predict what users will do next, and reacts to any incoming data in real time.

Past the obvious preliminary testing phase, there's a time when we need to assess how the new piece of functionality will really behave, in the ACTUAL prod environment as new data flows in, not in theory based on some static mock use cases.

It seems to me that the above requirement would be best dealt with by a "live staging" environment which is an exact replica of prod and which would continue being updated with new data, just like prod. The only difference with prod being the new bit of code that needs to be tested. In this staging environment, QA testers would be able to test how this new system works when new data arrives.

However, I must say that none of the companies I've worked for so far had this capability, and in all these companies, "staging" means a frozen snapshot of prod, and the only way to deal with "live-data-dependent" features described above is to resort to laborious manual testing involving weird mock data ingestion (creating fake posts with fake users, or mocking external incoming data, which is both inefficient and unreliable).

When discussing internally, usually the answer is some variation between "we don't have the time for it" or "I don't see how it could be useful, just add some data manually and you'll be fine". So my questions are:

does your current/past companies have such "live-staging" environments available?
is there something inherently difficult in making this kind of environment available?
is such capability a common practice in "established" companies (Facebook, Google, Twitter...)
what would be your advice to a company aiming to develop that capability?
last but not least, is there a common industry term for such a capability?

after a bit of searching, it looks like what I'm describing resembles blue/green deployment. From what I've seen it's far from being a commonly implemented thing in reality, though. — Jivan, Jul 14 '20 at 15:01
Generally you build tests and test automation to have repeatable tests with a predictable outcome. Once you have reached the limits of what you can test by loading a known state and replaying (captured or synthetic) traffic and (scripted) actions you enter the realm of *testing in production.* This article details some approaches there https://medium.com/@copyconstruct/testing-in-production-the-safe-way-18ca102d0ef1 — Bob, Jul 14 '20 at 15:19
@HermanB `once you have reached ... you enter the realm of testing in production` this is **exactly** what I'm talking about — thanks for the link — Jivan, Jul 14 '20 at 15:26
@downvoter could you please kindly recommend a better site to ask this question if you don’t see it belonging here? — Jivan, Jul 14 '20 at 20:23

Staging environment keeping up to date with new prod data

0 Answers0