I have a use case where I need to:
- iterate through each Input node in an Xml document
- perform a time-intensive calculation on each Input, and
- write the results to an XML file.
Input looks something like this:
<Root>
<Input>
<Case>ABC123</Case>
<State>MA</State>
<Investor>Goldman</Investor>
</Input>
<Input>
<Case>BCD234</Case>
<State>CA</State>
<Investor>Goldman</Investor>
</Input>
</Root>
and the output:
<Results>
<Output>
<Case>ABC123</Case>
<State>MA</State>
<Investor>Goldman</Investor>
<Price>75.00</Price>
<Product>Blah</Product>
</Output>
<Output>
<Case>BCD234</Case>
<State>CA</State>
<Investor>Goldman</Investor>
<Price>55.00</Price>
<Product>Ack</Product>
</Output>
</Results>
I would like to run the calculations in parallel; the typical input file may have 50,000 input nodes, and the total processing time without threading may be 90 minutes. Approximately 90% of the processing time is spent on step #2 (the calculations).
I can iterate over the XmlReader in parallel easily enough:
static IEnumerable<XElement> EnumerateAxis(XmlReader reader, string axis)
{
reader.MoveToContent();
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
if (reader.Name == axis)
{
XElement el = XElement.ReadFrom(reader) as XElement;
if (el != null)
yield return el;
}
break;
}
}
}
...
Parallel.ForEach(EnumerateAxis(reader, "Input"), node =>
{
// do calc
// lock the XmlWriter, write, unlock
});
I'm currently inclined to use a lock when writing to the XmlWriter to ensure thread safety.
Is there a more elegant way to handle the XmlWriter in this case? Specifically, should I have the Parallel.ForEach code pass the results back to the originating thread and have that thread handle the XmlWriter, avoiding the need to lock? If so, I'm unsure of the correct approach for this.