I was running Eucalyptus 4.0 - the environment is sounds and has been up for a couple of years without issue prior. I went through the shutdown procedure (stop all instances, stop eucalyptus-cloud, stop eucalyptus-cc, stop each node) and shutdown the environment recently for a move.
When I restored the environment all of the services came back online but no instances would start - new, old, etc. I noticed some issues about IP allocation (network has not changed in this process) so I release all of them back to the cloud and then re-allocated them.
I then had came across some online information due to other errors I was observing and ended up modifying two parameters:
euca-modify-property -p cloud.network.global_max_network_tag=2048
euca-modify-property -p cloud.network.global_min_network_tag=1024
Once this was done and I restarted the cloud again I was able to successfully launch new instances. With no long on the existing instances I upgraded --> 4.0.1 --> 4.0.2. Everything appeared upgrade without issue (my console still reports 4.0.0 but euca-version reports eucalyptus 4.0.2 with euca2ools 3.1.1/Omega).
However, I'm about 14 hours into it and I cannot start an old [EBS-backed] instance. It goes from stopped --> pending --> stopping --> stopped in a matter of seconds - and you can only even tell that from the logs. I believe there is some extra data leftover in the "metadata_extant_network" table (maybe something did not shutdown properly?) but I cannot identify what, nor can I remove records manually due to FK constraints, and I don't want to risk corrupting the database. Here are my logs when I attempt to start an instance - there must be a "proper" way to do this ... :
cloud-exhaust.log
Tue Dec 9 10:04:29 2014 WARN [org.jboss.netty.channel.DefaultChannelPipeline:Eucalyptus.eucalyptus:Ephemeral
[bitronix.tm.twopc.Preparer:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] executing transaction with 0 enlisted resource
Tue Dec 9 10:04:30 2014 WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] SQL Error: 0, SQLState: 23503
Tue Dec 9 10:04:30 2014 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] ERROR: update or delete on table "metadata_extant_network" violates foreign key constraint "fk6a62681ed068841d" on table "metadata_network_group"
Detail: Key (id)=(c75a9938419237320141929ac6a02eea) is still referenced from table "metadata_network_group".
postgresql-Tue.log
ERROR: update or delete on table "metadata_extant_network" violates foreign key constraint "fk6a62681ed068841d" on table "metadata_network_group"
DETAIL: Key (id)=(c75a9938419237320141929ac6a02eea) is still referenced from table "metadata_network_group".
STATEMENT: delete from metadata_extant_network where id=$1 and version=$2
ERROR: update or delete on table "metadata_extant_network" violates foreign key constraint "fk6a62681ed068841d" on table "metadata_network_group"
DETAIL: Key (id)=(c75a9938419237320141929ac6a02eea) is still referenced from table "metadata_network_group".
STATEMENT: delete from metadata_extant_network where id=$1 and version=$2
cloud-output.log
2014-12-09 10:04:30 ERROR | org.hibernate.exception.ConstraintViolationException: could not execute statement
2014-12-09 10:04:41 INFO | :1418144681687:Address:ADDRESS_STATE:TOP:Address 192.168.0.216 arn:aws:euare:000000000001:user/nobody available 0.0.0.0 AddressTransition system:unallocated->impending(true)
2014-12-09 10:04:41 ERROR | com.eucalyptus.cloud.util.MetadataException: org.hibernate.LazyInitializationException: could not initialize proxy - no Session
2014-12-09 10:04:41 WARN | Aborting resource token: ResourceToken:i-812D40D4:resources=TypedContext:{com.eucalyptus.util.TypedKey(NetworkResources)=[com.eucalyptus.compute.common.network.PrivateNetworkIndexResource(5), com.eucalyptus.compute.common.network.PublicIPResource()]}
cloud-debug.log
Tue Dec 9 10:04:30 2014 ERROR [NetworkGroups:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] org.hibernate.exception.ConstraintViolationException: could not execute statement
Tue Dec 9 10:04:41 2014 INFO [AdmissionControl:Compute.10] Found authorized clusters: [cc-192.168.0.150]
Tue Dec 9 10:04:41 2014 INFO [AdmissionControl:Compute.10] Availability: cc-192.168.0.150 -> 5
Tue Dec 9 10:04:41 2014 ERROR [ClusterAllocator:Eucalyptus.cluster:ClusterConfiguration:arn:euca:eucalyptus:cluster01:cluster:cc-192.168.0.150/.class java.util.concurrent.ThreadPoolExecutor$Worker#458] com.eucalyptus.cloud.util.MetadataException: org.hibernate.LazyInitializationException: could not initialize proxy - no Session
Tue Dec 9 10:04:41 2014 WARN [Allocations:Eucalyptus.cluster:ClusterConfiguration:arn:euca:eucalyptus:cluster01:cluster:cc-192.168.0.150/.class java.util.concurrent.ThreadPoolExecutor$Worker#458] Aborting resource token: ResourceToken:i-812D40D4:resources=TypedContext:{com.eucalyptus.util.TypedKey(NetworkResources)=[com.eucalyptus.compute.common.network.PrivateNetworkIndexResource(5), com.eucalyptus.compute.common.network.PublicIPResource()]}
cloud-error.log
Tue Dec 9 10:04:30 2014 ERROR [NetworkGroups:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] org.hibernate.exception.ConstraintViolationException: could not execute statement
Tue Dec 9 10:04:41 2014 ERROR [ClusterAllocator:Eucalyptus.cluster:ClusterConfiguration:arn:euca:eucalyptus:cluster01:cluster:cc-192.168.0.150/.class java.util.concurrent.ThreadPoolExecutor$Worker#458] [com.eucalyptus.cloud.run.ClusterAllocator.cleanupOnFailure(ClusterAllocator.java):274] com.eucalyptus.cloud.util.MetadataException: org.hibernate.LazyInitializationException: could not initialize proxy - no Session
So then I logged into the PostgreSQL database directly, removed the FK constraints, and manually removed the rows identified in the logs:
ALTER TABLE metadata_extant_network DROP CONSTRAINT "fk45157a25f1ac537e";
ALTER TABLE metadata_network_group DROP CONSTRAINT "fk6a62681ed068841d";
DELETE FROM metadata_extant_network WHERE id='c75a9938419237320141929ac6a02eea';
The delete was successful put after attempting to restart the instances I receive a new error:
euca-start-instances: error (InternalFailure): Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free.
Tue Dec 9 11:04:23 2014 ERROR [org.mule.exception.DefaultMessagingExceptionStrategy:Compute.15]
********************************************************************************
Message : Component that caused exception is: DefaultJavaComponent{Compute.component}. Message payload is of type: StartInstancesType
Code : MULE_ERROR--2
--------------------------------------------------------------------------------
Exception stack is:
1. Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free. (com.eucalyptus.cloud.util.NotEnoughResourcesException)
com.eucalyptus.network.NetworkGroup:325 (null)
2. Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free. (com.eucalyptus.cloud.util.NotEnoughResourcesException)
com.eucalyptus.cloud.run.AdmissionControl$RunAdmissionControl:148 (null)
3. Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free. (java.lang.RuntimeException)
com.eucalyptus.util.Exceptions:255 (null)
4. Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free. (com.eucalyptus.util.EucalyptusCloudException)
com.eucalyptus.compute.service.ComputeService:69 (null)
5. Component that caused exception is: DefaultJavaComponent{Compute.component}. Message payload is of type: StartInstancesType (org.mule.component.ComponentException)
org.mule.component.DefaultComponentLifecycleAdapter:352 (http://www.mulesoft.org/docs/site/current3/apidocs/org/mule/component/ComponentException.html)
--------------------------------------------------------------------------------
Root Exception stack trace:
com.eucalyptus.cloud.util.NotEnoughResourcesException: Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free.
at com.eucalyptus.network.NetworkGroup.extantNetwork(NetworkGroup.java:325)
at com.eucalyptus.network.GenericNetworkingService$_prepareSecurityGroup_closure3_closure12.doCall(GenericNetworkingService.groovy:198)
at sun.reflect.GeneratedMethodAccessor770.invoke(Unknown Source)
+ 3 more (set debug level logging or '-Dmule.verbose.exceptions=true' for everything)
********************************************************************************