0

I've recently faced few issues with resource allocation in YARN (my Hadoop MR app was not able to allocate new containers, while cluster was almost free) and I've looked into RM's scheduler stats (http:///ws/v1/cluster/scheduler) where some resources had negative values:

 <queue xsi:type="capacitySchedulerLeafQueueInfo">
        <capacity>19.0</capacity>
        <usedCapacity>-69.52686</usedCapacity>
        <maxCapacity>90.0</maxCapacity>
        <absoluteCapacity>19.0</absoluteCapacity>
        <absoluteMaxCapacity>90.0</absoluteMaxCapacity>
        <absoluteUsedCapacity>0.0</absoluteUsedCapacity>
        <numApplications>10</numApplications>
        <queueName>default</queueName>
        <state>RUNNING</state>
        <resourcesUsed>
           <memory>-152576</memory>
           <vCores>-41</vCores>
        </resourcesUsed>
        <hideReservationQueues>false</hideReservationQueues>
        <nodeLabels>*</nodeLabels>
        <allocatedContainers>24</allocatedContainers>
        <reservedContainers>0</reservedContainers>
        <pendingContainers>0</pendingContainers>
        <numActiveApplications>10</numActiveApplications>
        <numPendingApplications>0</numPendingApplications>
        <numContainers>-41</numContainers>
        <maxApplications>1900</maxApplications>
        <maxApplicationsPerUser>855</maxApplicationsPerUser>
        <maxActiveApplications>102</maxActiveApplications>
        <maxActiveApplicationsPerUser>10</maxActiveApplicationsPerUser>
        <userLimit>10</userLimit>
        ...
  </queue>

Is that ok from capacity scheduler POV? I thought that it may indicate reserved resources, but reservedContainers is 0.

Filipp
  • 859
  • 1
  • 6
  • 20
  • What version of Hadoop are you running? There are a lot of bugs due to synchronization issues that could cause negative values for resources. See https://issues.apache.org/jira/issues/?jql=project%20%3D%20YARN%20AND%20summary%20~%20negative%20ORDER%20BY%20key%20DESC – tk421 Jul 06 '17 at 05:56
  • I'm using hadoop 2.6.0. Thanks for pointing to hadoop's jira! – Filipp Jul 07 '17 at 16:32
  • Do you know how long your cluster has been running without a restart? If it's been a while I'm guessing it's a known issue. – tk421 Jul 07 '17 at 21:28

0 Answers0