In the last few days, old Jelastic environments that have been working for two or three years are failing without explanation. They are Java/Spring apps running on old Tomcat versions (7.0.57 and 7.0.61.)
In one such case (Tomcat 7.0.57), some Tomcat- and application-specific configuration files disappeared with no explanation, and then Tomcat refused to restart even after restoring the missing files. Given the urge to fix that production environment, we were forced to recreate the environment from scratch (with Jelastic providing us with a fresh Tomcat 7.0.88 node.)
On a second case (Tomcat 7.0.61), there appear to be no missing Tomcat files, but Tomcat just refuses to restart. We attempted to do a simple restart (as we have done hundreds of times), and then Tomcat just denied to restart. The Tomcat: Actions
window showed this:
Stopping tomcat [ OK ]
Starting tomcat [ FAILED ]
We had cleared or deleted the catalina.out
log and other files (using the buttons available in the dashboard) before attempting to restart. Then, after restarting, nothing is printed in catalina.out
(either in the dashboard or via SSH), so we have no sign of what might be happening that is causing Tomcat to fail to restart. We also tried stopping the environment altogether and starting it again, to no avail. We are sure that Tomcat is not running because the node is consuming one cloudlet, while normally it would be using at least 7.
Then, as a final attempt, we tried to redeploy the WAR, which we had kept in the Deployment Manager (it is the one that worked on the original deploy, months ago.) We got the following message:
Deploy to ROOT of App. Servers (<environment name>)
"<war file>" archive can't be deployed to "ROOT" context because this archive is damaged. Please recheck the archive and try again.
Missing Tomcat and application files and corrupt WAR files are a troubling signal. Is it possible that a recent upgrade performed by our Jelastic provider to Jelastic Platform version 5.4 is causing trouble with older environments? Perhaps some migration or upgrade procedure has been applied that has left older environments in a fragile condition? Perhaps my provider's hardware is failing?
At this point, we are unable to understand that's happening. Due to fear that a restart or a redeploy will break our production environments, we are forced to stop updating our applications on older environments until we figure out what's going on. Any help is appreciated.