0

I have some experience with defining transactions and have reviewed both the transaction definition guide and bookshelf. We're using CA APM Release 9.1.5. I have a 4-part transaction that I have captured in a recording session below.

Result of recorded transaction

After promoting the recording, tweaking the identifying transaction's match criteria, and removing cacheable from the transactions, I have this business transaction:

enter image description here

After synching monitors, I am seeing results. However, for each transaction captured I am getting 3 defects: a missing transaction for all of the non-identifying transactions.

The identifying transaction is correctly defined (I'm not getting bleed-over from other transactions that don't have this callchain). The non-identifying transactions are also correctly defined; to prove this, I changed the identifying transaction from registration-form to login.fcc, picked up traffic unique to this use case, and still got 3 defects per transaction (the 3 non-identifying transactions missing, this time with registration-form missing). The most tantalizing thing is that there was one successful transaction recorded today (among many more failures). Since there was one success, I thought there was a chance that the timeout definition was too short, so I increased it to 20s with no change.

Summary of potential issues and why they aren't the cause:

  • Not synching between changes.
    • I made sure to do this between every change.
  • Identifying transaction too vague/capturing irrelevant traffic.
    • The match criteria are applicable only to this definition.
  • Non-identifying transaction definitions incorrect.
    • The match criteria are applicable only to this definition.
    • Switching one transaction to be the identifying one correctly matched traffic for just that piece.
  • Transaction timeout is too short.
    • Increased transaction timeout to 20s with no success.
  • Transactions marked not cacheable when they should be.
    • Each transaction is a required step; even if caching was involved, most users never execute the chain more than once ever (so at least the majority would succeed).
  • APM correctly reporting failures.
    • Able to complete successful transaction chain myself and lots of alarms would be going off if it wasn't working.

Any ideas? I can provide more details if required.

Community
  • 1
  • 1
Tyler Hoppe
  • 438
  • 3
  • 11

1 Answers1

0

After countless errors of attempting to solve the issue, I abandoned hope of getting it resolved. This paired up with the bogus transaction sequences CEM was putting together made me suspect a serious defect in our current version of the application. Then, weeks (and a pleasant vacation) after I had given up, I managed to accidentally stumble across the root cause.

CEM binds transactions and components together using session identifiers (source). At a point in the recent past, we had a slight change to this on the application-side for a security patch. Not only that, the original configuration wasn't correct to begin with. If session identification is not set up properly, CEM will group transactions and components seemingly randomly. Below is one example where two different session components get grouped together to get false timing metrics. The opposite can also occur: other transactions not being bound into the request chain, despite actually occurring, resulting in missing transactions/components.

Example of transaction misidentification.

In our case, we are were using SiteMinder authentication and a cookie in our session identification config. The cookie didn't exist anymore and, due to the nature of the AND'd relationship, the session grouping was never happening. I updated the cookie name, resulting in this config:

Session config with SM and cookie presence AND'd.

However, this still wasn't working properly. Firstly, it wouldn't work for unauthenticated pages because we have a public part of our site that can be accessed before SiteMinder authentication. The CPSI cookie is present on these public pages, though. Not only that, but authenticated pages weren't working somehow either. I'm still not sure why authenticated pages would be impacted, since they should have both parts present. I attempted to fix unauthenticated pages by using an OR relationship instead, like so:

enter image description here

Unauthenticated pages started tracking time properly. Also, this somehow fixed authenticated pages too! Suddenly the multi-step transaction I gave up on started reporting success, and the whole application's response times changed drastically. It seems like the whole app was being misreported (especially high-traffic transactions) due to the bogus chains CEM was building.

TLDR: If CEM seems to be reporting inaccurate times, multi-step transactions aren't working, or you find yourself marking required components as cacheable to get CEM to report success, I suggest checking the Session Identification configuration (Administration -> Business Applications -> {Your App}). Remember to consult (their help docs) and think hard about your app (authenticated vs. unauthenticated, etc) to select the proper config.

Tyler Hoppe
  • 438
  • 3
  • 11