We are getting an NPE in Production, which is using apache-tomcat-8.5.28 or maybe 8.5.34 (not actually sure which but I can try to find out), with the following stack:
java.lang.NullPointerException: while trying to invoke the method javax.el.ELResolver.getValue(javax.el.ELContext, java.lang.Object, java.lang.Object) of a null object loaded from an array (which itself was loaded from field javax.el.CompositeELResolver.resolvers of an object) with an index loaded from local variable 'i'
at javax.el.CompositeELResolver.getValue(CompositeELResolver.java:62) ~[el-api.jar:3.0.FR]
at com.sun.faces.el.FacesCompositeELResolver.getValue(FacesCompositeELResolver.java:72) ~[jsf-impl.jar:1.2_13-b02-FCS]
at org.apache.el.parser.AstValue.getValue(AstValue.java:169) ~[jasper-el.jar:8.5.34]
at org.apache.el.ValueExpressionImpl.getValue(ValueExpressionImpl.java:190) ~[jasper-el.jar:8.5.34]
at com.sun.faces.application.ValueBindingValueExpressionAdapter.getValue(ValueBindingValueExpressionAdapter.java:113) ~[jsf-impl.jar:1.2_13-b02-FCS]
...
The error does not always happen, though once it does happen it seems to happen a lot for that particular server node. I've spent the last few days digging into decompiled JARs and putting breakpoints to try to understand the error.
This class is FacesCompositeELResolver
that extends CompositeELResolver
, and there are only 2 instances of this class ever created: one with chainType=Faces, and one with chainType=JSP. The former one (where chainType=Faces) is created as part of the first call to the FacesServlet.service() which only happens when you hit the server the first time.
The parent class, CompositeElResolver
, is trying to call getValue which has code like this:
// copied from CompositeELResolver.java
public Object getValue(ELContext context, Object base, Object property) {
context.setPropertyResolved(false);
int sz = this.size;
for(int i = 0; i < sz; ++i) {
Object result = this.resolvers[i].getValue(context, base, property); // <---- line 62 NPE here
if(context.isPropertyResolved()) {
return result;
}
}
return null;
}
In this case it's important to recognize that the error mentions that the array "this.resolvers[i]" contains a null element at the ith position. The following code is responsible for setting the value inside the this.resolvers array:
// copied from CompositeELResolver.java
public void add(ELResolver elResolver) {
Objects.requireNonNull(elResolver); // <---- line 47
if(this.size >= this.resolvers.length) {
ELResolver[] nr = new ELResolver[this.size * 2];
System.arraycopy(this.resolvers, 0, nr, 0, this.size);
this.resolvers = nr;
}
this.resolvers[this.size++] = elResolver; // <---- line 54
}
At face value, it looks like it would be impossible for this.resolvers array to get a null value in a array position which has index less than this.size due to line 47 validating that the value cannot be null; however, it is possible in a multi-threaded environment with 2 threads sharing the same object reference. The first part I mentioned about FacesCompositeElResolver
being a shared object only instantiated once and shared across all threads is important.
Supposing 2 Threads called FacesServlet.service() at exactly the same time and executed the call to add(elResolver)
, then line 54 this.resolvers[this.size++] = elResolver;
can be called at the same time on the same FacesCompositeELResolver
instance, that would cause this.size++
to happen 2 times before the assignment to the array position is executed, causing a race condition.
So I put a break point at the CompositeELResolver.add function and see who calls this, and see if that code is protected with some sort of synchronized
function which would protect this code in a multi-threaded environment.
javax.el.CompositeELResolver.add():54
com.sun.faces.el.FacesCompositeElResolver.add():60
com.sun.faces.el.ElUtils.buildFacesResolver():160
This might be the problem, though I have not yet reproduced the problem, I suspect the above stack is to blame.
To summarize the ElUtils.buildFacesResolver, a static function, is modifying a shared object of type FacesCompositeELResolver, and if the server is "lucky" enough to call this same stack in 2 threads at startup this would result in a this.resolvers
array in the singleton instance containing a null object which would constantly through NullPointerException until the server is bounced - which does seem to be precisely what we're seeing.
But here's the thing none of this really belongs to the Application code being run, it seems part of the jsf-impl.jar which we're using which seems at version 1.2_13-b02-FCS according to the manifest.mf file in that jar. I'm not sure where this version comes from, or how I could modify it if I needed to.
Can someone help me to understand where I might be able to get support on these particular classes, is this something that I can control?
Any other help or clues someone can provide on this would be helpful.