In a production process...
steps depend on completion of other steps.
material can arrive asynchronously, meaning subsequent steps wait for the product to arrive to work on. However, be aware this does not mean unlimited material can arrive out of control, only the material to be consumed for that specific manufacture order. If your scenario allows a stream of unlimited data to pour in, then you must organize it pre-process to avoid mixing different product components. Don't compromise the structure of the process to try to handle asynchronously arriving data in some buffer or whatever, because manufacturing data products involves relational data not raw material.
subcomponents may be completed in joining branches, meaning the assembling step waits for the coordinated set of related components to arrive before assembling begins.
I am the creator of POWER, the only collaborative (manufacturing) architecture to date. There is a lot to learn about this subject, but you can find my articles and code online: http://www.powersemantics.com/
Here is what your process looks like in manufacturing's model for work:
class MyProduct
{
public object[i,j] x_clean { get; set; }
public object[j] y_clean { get; set; }
public object[j] z_clean { get; set; }
// final product
public object[j] w_clean { get; set; }
}
class MyProcess : Producer<MyProduct>, IProcess, IMachine, IOrganize
{
// process inputs
public object[i,j] x { get; set; } // raw file
public object[j] z { get; set; } // raw file
// machines
public CleanerA Cleaner1 { get; set; }
public Aggregator Aggregator1 { get; set }
public CleanerB Cleaner2 { get; set; }
public Assembler Assembler1 { get; set; }
public void D() { // instantiates properties and machines }
public void O()
{
// bind machines to work on the same data points
// allows maintenance to later remove cleaners if it becomes possible
// for the process to receive data in the correct form
Cleaner1.x = x;
Cleaner1.Product.x_clean = Product.x_clean;
Aggregator1.x_clean = Product.x_clean;
Aggregator1.Product.y_clean = Product.y_clean;
Cleaner2.z = z;
Cleaner2.Product.z_clean = Product.z_clean;
Assembler1.z_clean = Product.z_clean;
Assembler1.y_clean = Product.y_clean;
Assembler1.Product.w_clean = Product.w_clean;
}
// hardcoded synchronous controller
public void M()
{
Cleaner1.M();
Aggregator1.M();
Cleaner2.M();
Assembler1.M();
}
}
// these class pairs are Custom Machines, very specific work organized
// by user requirements rather than in terms of domain-specific operations
class CleanerAProduct
{
public object[i,j] x_clean { get; set; }
}
class CleanerA: Producer<CleanerAProduct>, IMachine
{
public object[i,j] x { get; set; } // raw file
public void M()
{
// clean the raw file x[i,j] and store it as x_clean[i,j]
}
}
class AggregatorProduct
{
public object[j] y_clean { get; set; }
}
class Aggregator: Producer<AggregatorProduct>, IMachine
{
public object[i,j] x_clean { get; set; }
public void M()
{
// aggregate the results from x_clean[i,j] and store it as y_clean[j]
}
}
class CleanerBProduct
{
public object[j] z_clean { get; set; }
}
class CleanerB : Producer<CleanerBProduct>, IMachine
{
public object[j] z { get; set; }
public void M()
{
// clean a raw file z[j] and store it as z_clean[j]
}
}
class AssemblerProduct
{
public object[j] w_clean { get; set; }
}
class Assembler : Producer<AssemblerProduct>, IMachine
{
public object[j] y_clean { get; set; }
public object[j] z_clean { get; set; }
public void M()
{
// combine z_clean[j] and y_clean[j] and store it as w_clean[j]
}
}
Normal usage of a production process class:
- Instantiate. Call D() to instantiate machines and product.
- Assign any inputs to the process.
- Call O() to have the process distribute those inputs to machines as well as bind the machines to operate on the end product. This is your last chance to override those assignments before production.
- Call M() to execute the process.
Most source code welds together producers and consumers within the same function body and thereby becomes a pain to maintain later, and then functions e-mail the data to one another like useless office workers who don't keep an e-mail trail. That causes problems when you later want to make vertical integration decisions like replacing a machine or extending the process, all of which I've documented with sources. POWER is the only architecture which avoids complexities like centralization. I released it in February.
There are ETL tools and other solutions like TPL Dataflow, but production processes are not going to organize or manage themselves for programmers. All programmers need to learn POWER to correctly handle the responsibilities of waste, integration, control and instrumentation. Employers look at us funny when we write automated code and then can't stop live execution on a dime, but our education only prepares us to create processes not architect them the way manufacturing does.