Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-8975

Restarting OpenNMS while monitoring nodes via Minions may create erroneous outages

    XMLWordPrintable

    Details

    • Sprint:
      Horizon - Dec 14th

      Description

      When restarting OpenNMS, invoking the monitor can fail with an error similar to:

      2017-01-06 10:48:18,087 ERROR [Poller-Thread-1-of-30] o.o.n.p.p.PollableServiceConfig: Unexpected exception while polling PollableService[location=HQ, interface=PollableInterface [PollableNode [3]:108.169.150.250], svcName=HTTPS]. Marking service as DOWN
      java.util.concurrent.ExecutionException: org.apache.camel.component.direct.DirectConsumerNotAvailableException: No consumers available on endpoint: Endpoint[direct://executeRpc]. Exchange[Message: [Body is not logged]]
              at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[?:1.8.0_111]
              at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) ~[?:1.8.0_111]
              at org.opennms.netmgt.poller.pollables.PollableServiceConfig.poll(PollableServiceConfig.java:132) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableService.poll(PollableService.java:190) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableElement.poll(PollableElement.java:293) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableContainer$5.run(PollableContainer.java:319) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_111]
              at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:264) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:250) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:228) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableContainer.poll(PollableContainer.java:326) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableInterface.poll(PollableInterface.java:228) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableContainer$5.run(PollableContainer.java:319) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_111]
              at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:264) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:250) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:228) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableContainer.poll(PollableContainer.java:326) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableNode$3.run(PollableNode.java:331) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_111]
              at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:264) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:250) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:228) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableNode.doPoll(PollableNode.java:334) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableElement.doPoll(PollableElement.java:184) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableService.doPoll(PollableService.java:214) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableService$PollRunner.run(PollableService.java:60) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_111]
              at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:264) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:250) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableService.doRun(PollableService.java:404) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.poller.pollables.PollableService.run(PollableService.java:379) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.scheduler.Schedule.run(Schedule.java:142) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.scheduler.Schedule$ScheduleEntry.run(Schedule.java:86) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.scheduler.LegacyScheduler$1.run(LegacyScheduler.java:179) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
              at org.opennms.core.concurrent.LogPreservingThreadFactory$3.run(LogPreservingThreadFactory.java:124) [opennms-util-19.0.0-SNAPSHOT.jar:?]
              at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
      Caused by: org.apache.camel.component.direct.DirectConsumerNotAvailableException: No consumers available on endpoint: Endpoint[direct://executeRpc]. Exchange[Message: [Body is not logged]]
              at org.apache.camel.component.direct.DirectProducer.process(DirectProducer.java:47) ~[camel-core-2.14.1.jar:2.14.1]
              at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:191) ~[camel-core-2.14.1.jar:2.14.1]
              at org.apache.camel.processor.UnitOfWorkProducer.process(UnitOfWorkProducer.java:74) ~[camel-core-2.14.1.jar:2.14.1]
              at org.apache.camel.impl.ProducerCache$2.doInProducer(ProducerCache.java:375) ~[camel-core-2.14.1.jar:2.14.1]
              at org.apache.camel.impl.ProducerCache$2.doInProducer(ProducerCache.java:343) ~[camel-core-2.14.1.jar:2.14.1]
              at org.apache.camel.impl.ProducerCache.doInProducer(ProducerCache.java:233) ~[camel-core-2.14.1.jar:2.14.1]
              at org.apache.camel.impl.ProducerCache.sendExchange(ProducerCache.java:343) ~[camel-core-2.14.1.jar:2.14.1]
              at org.apache.camel.impl.ProducerCache.send(ProducerCache.java:201) ~[camel-core-2.14.1.jar:2.14.1]
              at org.apache.camel.impl.DefaultProducerTemplate.send(DefaultProducerTemplate.java:128) ~[camel-core-2.14.1.jar:2.14.1]
              at org.apache.camel.impl.DefaultProducerTemplate$15.call(DefaultProducerTemplate.java:636) ~[camel-core-2.14.1.jar:2.14.1]
              at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_111]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_111]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_111]
              ... 1 more
      2017-01-06 10:48:18,102 INFO  [Poller-Thread-1-of-30] o.o.n.p.p.PollableElement: Changing status of PollableElement PollableService[location=HQ, interface=PollableInterface [PollableNode [3]:108.169.150.250], svcName=ICMP] from Up to Down
      

      This is triggered due to the fact that the CamelContext reponsible for the RPC calls is stopped, and is rejecting the executions.

      This case should be caught, and we should return PollStatus.unknown() to indicate that we were unable to invoke the monitor.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                j-white Jesse White
                Reporter:
                j-white Jesse White
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: