8

Update

I should have added from the outset - this is in Microsoft Dynamics CRM 2011


I know CRM well, but I'm at a loss to explain behaviour on my current deployment.

Please read the outline of my scenario to help me understand which of my presumptions / understandings is wrong (and therefore what is causing this error). It's not consistent with my expectations.

Basic Scenario

  • Requirement demands that a web service is called every X minutes (it adds pending items to a database index)
  • I've opted to use a workflow / custom entity trigger model (i.e. I have a custom entity which has a CREATE plugin registered. The plugin executes my logic. An accompanying workflow is started when "completed" time + [timeout period] expires. On expiry, it creates a new trigger record and the workflow ends).
  • The plugin logic works just fine. The workflow concept works fine to a point, but after a period of time the workflow stalls with a failure:

    This workflow job was canceled because the workflow that started it included an infinite loop. Correct the workflow logic and try again. For information about workflow logic, see Help.

So in a nutshell - standard infinite loop detection. I understand the concept and why it exists.

Specific deployment

Firstly, I think it's quite safe for us to ignore the content of the plugin code in this scenario. It works fine, it's atomic and hardly touches CRM (to be clear, it is a pre-event plugin which runs the remote web service, awaits a response and then sets the "completed on" date/time attribute on my Trigger record before passing the Target entity back into the pipeline) . So long as a Trigger record is created, this code runs and does what it should.

Having discounted the content of the plugin, there might be an issue that I don't appreciate in having the plugin registered on the pre-create step of the entity...

So that leaves the workflow itself. It's a simple one. It runs thusly:

  1. On creation of a new Trigger entity...
  2. it has a Timeout of Trigger.new_completedon + 15 minutes
  3. on timeout, it creates a new Trigger record (with no "completed on" value - this is set by the plugin remember)
  4. That's all - no explicit "end workflow" (though I've just added one now and will set it testing...)

With this set-up, I manually create a new Trigger record and the process spins nicely into action. Roll forwards 1h 58 mins (based on the last cycle I ran - remembering that my plugin code may take a minute to finish running), after 7 successful execution cycles (i.e. new workflow jobs being created and completed), the 8th one fails with the aforementioned error.

What I already know (correct me where I'm wrong)

Recursion depth, by default, is set to 8. If a workflow / plugin calls itself 8 times then an infinite loop is detected.

Recursion depth is reset every one hour (or 10 minutes - see "Warnings" in linked blog?)

Recursion depth settings can be set via PowerShell or SDK code using the Deployment Web Service in an on-premise deployment only (via the Set-CrmSetting Cmdlet)

What I don't want to hear (please)

"Change recursion depth settings"

I cannot change the Deployment recursion depth settings as this is not an option in an online scenario - ultimately I will be deploying to CRM Online too.

"Increase the timeout period on your workflow"

This is not an option either - the reindex needs to occur every 15 minutes, ideally sooner.

Update

@Boone suggested below that the recursion depth timeout is reset after 60 minutes of inactivity rather than every 60 minutes. Therein lies the first misunderstanding.

While discussing with @alex, I suggested that there may be some persistence of CorrelationId between creating an entity via the workflow and the workflow that ultimates gets spawned... Well there is. The CorrelationId is the same in both the plugin and the workflow and any records that spool from that thread. I am now looking at ways to decouple the CorrelationId (or perhaps the creation of records) from the entity and the workflow.

Greg Owens
  • 3,878
  • 1
  • 18
  • 42
  • The workflow is calling itself, that's why you end up in an infinite loop. You'll have to rethink the whole approach, increasing the depth or the timeout period would only delay the killing of the process due to infinite looping. – Alex May 15 '12 at 12:07
  • Thanks for your answer Alex - you've made it sound simple but I don't think it is so. Even if we accept that the workflow is calling itself (MSCRM thinks it is, I think it isn't - the workflow creates a new record, rather than explictly calling itself again as a child workflow. I would expect this to be a new "thread" - though perhaps the CorrelationId of the plugin inherits from the parent and passes down to subsequent child workflows), the depth of the recursion should be reset every 10 or 60 minutes but it isn't. – Greg Owens May 15 '12 at 12:12
  • I believe MSCRM actually recognizes the indirect recursion taking place there due exactly to CorrelationId s, that's why the loop detection kicks in. I'm intrigued, will experiment a bit and get back to you (this might actually be useful to me too, who knows if/when I'll receive a similar requirement? (: ) – Alex May 15 '12 at 12:39
  • I guess that if that's the case, it might be possible to decouple the process by moving the plugin that sets "completion time" to be asynchronous and post-operation perhaps... I'll try that too I think. Interested to hear your independent findings too :) – Greg Owens May 15 '12 at 13:29
  • Moving the plugin execution point and/or changing to async not only introduces different issues for me to address (not in surmountable to be fair) but more importantly doesn't make a difference to the correlationid. – Greg Owens May 16 '12 at 08:11

2 Answers2

3

I doubt this can be solved like this.

I'd suggest a different approach: deploy a simple application alongside CRM and let it call the web service, which in turn can use the XRM endpoints in order to change the records.

UPDATE

Or, you can try something like this upon your crm service initialization in the plugin (dug it up from one of my plugins) leaving your workflow untouched:

CrmService service = new CrmService();
//initialize service here, then...

CorrelationToken newtoken = new CorrelationToken();
newtoken.CorrelationId = context.CorrelationId;
newtoken.CorrelationUpdatedTime = context.CorrelationUpdatedTime;

// WILD GUESS: Enforce unlimited depth ?
corToken.Depth = 0; // THIS WAS: context.Depth;

//updating correlation token
service.CorrelationTokenValue = corToken;

I admit I don't really remember much about this (code dates back to about 2 years ago), but it might help.

Alex
  • 23,004
  • 4
  • 39
  • 73
  • Thanks again Alex. I know that there are umpteen ways I might go about architecting a solution that acheives the same aim, but each have their own disadvantages too. I want to fully understand the issues I face in this model before discounting it out of hand. The behaviour I am seeing here is not consistent with my understanding of the documentation. I rather feel that it _is_ possible, but may need certain things to be done slightly differently. – Greg Owens May 15 '12 at 12:19
  • @Greg i added some code which might help (or not, i have no way of testing it) – Alex May 15 '12 at 13:34
  • +1 Though it's a more aggressive approach than I might have hoped for, this is probably the route I'd have gone for in the end. Hopefully will get a better understand of why it's occurring from this question though. Intrigued by Boone's anecdotal "60 mins inactivity" period. – Greg Owens May 15 '12 at 13:42
  • 1
    @Greg What you're seeing is matching Microsoft's expectations. We've run into the exact same problem ourselves and have gone through very lengthy discussions internally and with Microsoft on the issue. We re-architected our solution to avoid this scenario altogether. What you're seeing is sadly expected in this scenario. I believe alex has the right line of thinking here to work around the "feature". – GotDibbs May 16 '12 at 03:27
  • @GottDibbs I think the issues for me were a) not appreciating that the timeout value is "timeout after inactivity" and b) that the correlationid persists between plugin and workflow, even though they didn't feel as closely coupled as that **in this scenario**... Alex's suggestion is sound in principle (or in Version 4) but I'm not sure this is going to be feasible in 2011 (specifically CRM Online). – Greg Owens May 16 '12 at 08:08
3

For the one hour "reset" to take place you have to have NO activity for an hour. It doesn't reset just 1 hour from the original. So since you have an activity every 15 minutes, it never has a chance to reset. I don't know that is said in stone anywhere... but from my experience.

In CRM 4 it was possible to create a CRM Service (Google creating a CRM service in the child pipeline) and reset the correlation ID (using CorrelationToken.NewToken()). I don't see anything so easy in the 2011 SDK. No idea if this trick worked in the online environment. Is 2011 online backwards compatible with CRM 4 plug-ins?

One thing you could try would be to use the IExecutionContext.CorrelationId to scavenge the asyncoperation (System Job) table. But according to the metadata, the attribute I think might be useful (CorrelationId, CorrelationUpdatedTime, Depth) are NOT valid for update. Maybe you could delete the rows? Even that may not help.

John Hoven
  • 4,085
  • 2
  • 28
  • 32
  • Thanks Boone. I hadn't considered the "one hour" meaning 60 mnins of inactivity and still struggle to read the SDK to mean that - but interested to hear that this is your experience (no surprise if the docs are inaccurate...!). I did very briefly consider doing something with CorrelationId, without going the whole route of logging it and checking it as the cause (which I will do now following Alex's comments above) but noted that it only has a getter so cannot be updated. I don;t believe that v4 plugins are valid for CRM Online so the CRm service trick won't be viable either I fear. – Greg Owens May 15 '12 at 13:30
  • Yes - it might be difficult to achieve in sand boxed plugins. I don't think you can use reflection (which maybe could be used to get at the private member/setter). In the online environment you may very well be limited to an external service. Even if you were able to get Alex's code to work, you don't want to set the CorrelationId from the previous context, you want to use a new Id (that's the point). – John Hoven May 15 '12 at 13:48
  • Well thought out question by the way. Here is hoping I don't have to update my correlation-changing CRM 4 plug-ins to the sand boxed online environment :) – John Hoven May 15 '12 at 13:49
  • +1 @Boone - I found confirmation (by strong implication) that the "60 minute" period is indeed 60 minutes of inactivity. See link below - note that the property name is Min **Inactive** Seconds: [link](http://msdn.microsoft.com/en-us/library/microsoft.xrm.sdk.deployment.workflowsettings.mininactiveseconds) – Greg Owens May 15 '12 at 15:54
  • (Strong implication) - that's usually the best you get lol. I assume you're going to look for another way to make your requirement work in 2011? – John Hoven May 16 '12 at 11:15
  • Yes will have to. Luckily, I already have an external web service on this project (integration to/from CRM) so can extend that. The aim is to break the CorrelationId inheritance by writing an additional plugin which will call my integration WS which will then call back to CRM and create an additional "trigger" record with different attributes. In effect I'll have trigger + plugin to create the reindexing request (created by the workflow, invoking my WS) and a trigger + plugin to "execute" the original reindex request (created by the WS). Hard to explain, looks simpler on paper! – Greg Owens May 16 '12 at 12:37