1

I am using a vendor-provided configuration of Apache Tomcat that utilizes JNDI Realms to connect to Active Directory to perform user authentication. Part of the configuration includes specifying the connectionUrl and an optional alternateUrl for targets to use for user authentication.

With the latest iteration of the software bundled using Apache Tomcat v9.0.68.0 running with Eclipse Adoptium 11.0.17+8, the connection attempts are being directed to our cloud-based AD domain controllers to service the authentication request. There are no firewall rules to allow this traffic in or out of the corporate network. When we look at packet captures using WireShark we can see traffic outbound from the application server with a destination port of 636 (LDAPS) and a destination server of either of the local domain controllers defined in the connectionUrl and/or alternateUrl realm options.

When we look at the firewall logs using Panorama, we see a source of the application server and a destination server of either our AWS domain controllers or Azure domain controllers on port 636. These correspond to the logs in the application where we see these:

Aug 03, 2023 2:55:27 PM org.apache.catalina.realm.JNDIRealm authenticate
SEVERE: Exception performing authentication
javax.naming.PartialResultException [Root exception is javax.naming.CommunicationException: cpr.ca:636 [Root exception is java.net.SocketTimeoutException: connect timed out]]

I have worked with the vendor's R&D team as well as our Domain Admins and network/firewall team and no one has been able to come up with a reason or a resolution to prevent this from happening. If nothing else, I'm looking to try to get more information from the process, whether it is through debug logging, other options for the connector, engine, realm, etc. or black magic!

This is an excerpt of the server.xml file's JNDI Realm definition:

      <Realm className="com.bmc.bcan.catalina.realm.BNALockOutRealm"
             failureCount="5"
             lockOutTime="86400"
             cacheSize="1000"
             cacheRemovalWarningTime="3600">
        <Realm className="com.bmc.bcan.catalina.realm.BNAJNDIRealm"
               connectionURL="ldaps://<FQDN1>:636"
               alternateURL="ldaps://<FQDN2>:636"
               connectionName="CN=*****,OU=*****,OU=*****,OU=*****,OU=*****,OU=*****,DC=**,DC=**"
               connectionPassword="<password>"
               userBase="DC=**,DC=**"
               userSearch="(sAMAccountName={0})"
               userSubtree="true"
               referrals="follow"
         />
      </Realm>
      <Host name="localhost" appBase="webapps" unpackWARs="true" autoDeploy="false">
        <Valve className="org.apache.catalina.valves.ErrorReportValve" showReport="false" showServerInfo="false" />
      </Host>
</Engine>```
Greg M
  • 35
  • 1
  • 6
  • I image that neither `FQDN1` nor `FQDN2` are `cpr.ca`, right? If you are not validating certificates, you could try adding an entry to `/etc/hosts` to make cpr.ca resolve to FQDN1 and see in your LDAP logs what the request is? Maybe it is not authentication but the app trying to resolve its configuration from an LDAP server? – ixe013 Aug 04 '23 at 01:57
  • Yes, both FQDNs have cpr.ca as their domains. What I suspect is happening, but I have been yet unable to prove is that the application is doing an nslookup on the domain, receiving the list of domain controllers, resolving their IP addresses to FQDNs, sorting them alphabetically, and then using the first in the list, which would be a cloud-based DC. What I can't understand is why it would do that when the URL has a specific FQDN? – Greg M Aug 04 '23 at 18:56

1 Answers1

0

Active Directory is often configured with a round-robin DNS. You can verify by doing nslookup a few times in a row. You'll see that the order of the IP address returned to you change every time.

This is usually not a problem for software using Microsoft's API, as they will automatically pick the next one the list. Java does not work that way, and their reasoning is valid. Think about it:

  1. A Java application does a DNS lookup.
  2. It succeeds, but the IP address returned is not reachable. We don't know that yet
  3. The application sends a TCP SYN packet on port 636 at that IP address
  4. The connection times out

The problem here is that there is no way for the application to know that querying the DNS again will return a different address. As far as it is concerned, the lookup step was successful. The connection step failed.

This problem also manifests itself when the DC is patched. The DNS will return an IP that will be valid in a few minutes, when the server is done rebooting. I'm sure you've heard of impossible to reproduce connection problems? Look no further.

A quick fix is to add an entry that points to a DC that you can reach in /etc/hosts of the "computer" where your application is running. But it does not scale and it does not fix the patch-then-reboot issue.

The solution is to replace this poor man's load balancing solution (round robin DNS) with a proper load-balancer. But you (and your AD admin) won't want to mess with such a sensitive part of the infrastructure.

This worked for me in the past:

  1. Get a new hostname, like load-balanced-dc.cpr.ca
  2. Configure your load-balancer with that address
    • Put the domain controllers you can reach "behind" that load-balancer
    • Do not intercept TLS traffic, but stick to the selected server.
  3. Issue a new certificate with the name load-balanced-dc.cpr.ca in the subject alt names, along with cpr.ca (or whatever hostname your DC has)
  4. Replace your domain controller's certificate with the one you just issued.
  5. Configure your application to use load-balanced-dc.cpr.ca

Active Directory is a number of things in the Microsoft environment, but it is also a plain old (but very good) LDAP server. That's all Tomcat needs. Depending on your load-balancer configuration and ability to load-balance itself, you might not need two entries in your application's configuration file.

+As a bonus, you'll get kudos from every owner of an application that is not Microsoft native for setting that up!

ixe013
  • 9,559
  • 3
  • 46
  • 77
  • Since I missed this and did not respond soon enough, the question was marked as answered, though it isn't really answered yet. Regarding the DNS lookup, I have to ask about why Tomcat would do an nslookup on the domain and not the defined host in the connectionURL and/or alternateURL values defined in the JNDIRealm configuration? If the DNS query of the connectionURL host is successful and the Domain Controller it represents is available, why would it not attempt to use that server for authentication? – Greg M Aug 15 '23 at 18:38
  • Don't worry, it's always soon enough if the problem persists! I misunderstood your configuration and provided the valid answer to a different problem. Can you add the result of nslookup of the hosts you use in your configuration, if only to make sure they always return a single IP? The output of `grep cpr /etc/hosts` could also help. – ixe013 Aug 15 '23 at 19:41
  • The nslookup output is a listing of 10-12 IP addresses, including those of the cloud-based domain controllers. I can't share the actual IPs due to security and privacy regulations for the company. – Greg M Aug 17 '23 at 15:37
  • No need to share the IP. They probably are in a 10/8 network anyway . So if `nslookup` for a host you put in your configuration file returns an IP that cannot be reached, you will need a load-balancer like I suggested. The other possibility is that your domain controller is returning an LDAP refferral to the "bare" cpr.ca domain. Unlikely and harder to debug. Make sure `userBase` in your configuration is as precice as possible, something like `OU=users,DC=cpr,DC=ca`. – ixe013 Aug 21 '23 at 20:01
  • Thanks. I originally was using an F5 load-balanced option, but for the sake of troubleshooting, I opted for the statically assigned primary and alternate servers to eliminate the F5 as a potential problem. The unfortunate reality is our AD schema was not designed with forethought. Our users are spead across a few different OUs, and my LDAP-fu is not sufficiently developed to write that query properly. Regardless, it won't matter if the request is being sent to an off-premises DC that has no authority and cannot be reached through our firewalls. – Greg M Aug 24 '23 at 15:25
  • There is a risk that a query that is too high up the chain (with only `DC=` attributes) returns an LDAP referral, kind of like an HTTP 302. If it sends a referral, it will likely use the bare `cpr.ca` domain. As for networking, I'd be surprised if the F5 were the problem. Make sure you can reach them, with their own hostname, and you should never hear about the off-premises DC again. See https://stackoverflow.com/a/61958564/591064 and add `-Dcom.sun.jndi.ldap.connect.pool.debug=all` to your command line to get to the bottom of this. – ixe013 Aug 25 '23 at 15:56
  • Full disclosure, I am not a Java programmer. I am not 100% sure where I need to add this option to generate the output in my logs. We have a logging.properties file that is called by both the application as well as the JVM during startup., Is that the best place for it? Or do I need to add this elsehwere, like wherever the command to start the service is called from? BTW - if this gives me useful information, it will be the first major step forward in this process of troubleshooting I've had in months. Thanks! – Greg M Aug 25 '23 at 20:14
  • It is a flag you add to the JVM's command line, not to the code its running. If you look at the running process `ps -ef | grep java` you will likely see something like this `java -Daaa.bbb.ccc=1 -Dxxx.yyy.zzz=2 -jar something-maybe-tomcat.jar` and a lot of other parameters. You need to add another `-Dcom.sun.jndi.ldap.connect.pool.debug=all` before the `-jar` parameter. The `-D` parameters can be in any order. – ixe013 Aug 28 '23 at 15:12
  • I'm struggling with how to insert the logging option into the startup. The service runs on a Windows server and is installed using the `tomcat9 //IS//...` context. I added the `-Dcom.sun.jndi.ldap.connect.pool.debug=all` option to the Java 9 startup options and restarted the service, but I still am not seeing any logged output from it. – Greg M Aug 28 '23 at 22:31