Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-8976

Restarting OpenNMS while performing SNMP data-collection via Minions may create dataCollectionFailed alarms

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 19.0.0
    • Fix Version/s: 19.0.0
    • Component/s: Data Collection - SNMP
    • Security Level: Default (Default Security Scheme)
    • Labels:
      None
    • Sprint:
      Horizon - Dec 14th

      Description

      Similar to NMS-8975, restarting OpenNMS while performing SNMP data-collection on remote nodes can trigger a number of dataCollectionFailed alarms.

      2017-01-06 10:50:30,639 WARN  [Collectd-Thread-30-of-50] o.o.n.c.CollectableService: run: failed collection for 120/172.20.10.1/SNMP/example1
      2017-01-06 10:50:30,639 WARN  [Collectd-Thread-30-of-50] o.o.n.c.CollectableService: org.opennms.core.rpc.api.RequestTimedOutException: org.apache.camel.ExchangeTimedOutException: The OUT message was not received within: 60000 millis due reply message with correlationID: Camel-ID-jw-dev-1-44009-1483717755100-0-124 not received. Exchange[Message: [Body is not logged]]
      org.opennms.netmgt.collectd.CollectionWarning: org.opennms.core.rpc.api.RequestTimedOutException: org.apache.camel.ExchangeTimedOutException: The OUT message was not received within: 60000 millis due reply message with correlationID: Camel-ID-jw-dev-1-44009-1483717755100-0-124 not received. Exchange[Message: [Body is not logged]]
              at org.opennms.netmgt.collectd.SnmpCollectionSet.collect(SnmpCollectionSet.java:390) ~[opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.collectd.SnmpCollector.collect(SnmpCollector.java:333) ~[opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.collectd.CollectionSpecification.collect(CollectionSpecification.java:274) ~[opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.collectd.CollectableService.doCollection(CollectableService.java:395) ~[opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.collectd.CollectableService.doRun(CollectableService.java:337) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.collectd.CollectableService.access$200(CollectableService.java:69) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.collectd.CollectableService$1.run(CollectableService.java:315) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.core.logging.Logging.withPrefix(Logging.java:71) [org.opennms.core.logging-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.collectd.CollectableService.run(CollectableService.java:304) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at org.opennms.netmgt.scheduler.LegacyScheduler$1.run(LegacyScheduler.java:179) [opennms-services-19.0.0-SNAPSHOT.jar:?]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
              at org.opennms.core.concurrent.LogPreservingThreadFactory$3.run(LogPreservingThreadFactory.java:124) [opennms-util-19.0.0-SNAPSHOT.jar:?]
              at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
      Caused by: org.opennms.core.rpc.api.RequestTimedOutException: org.apache.camel.ExchangeTimedOutException: The OUT message was not received within: 60000 millis due reply message with correlationID: Camel-ID-jw-dev-1-44009-1483717755100-0-124 not received. Exchange[Message: [Body is not logged]]
              at org.opennms.core.rpc.camel.CamelRpcClientFactory$1$1.onFailure(CamelRpcClientFactory.java:103) ~[org.opennms.core.ipc.rpc.camel-impl-19.0.0-SNAPSHOT.jar:?]
              at org.apache.camel.impl.DefaultProducerTemplate$15.call(DefaultProducerTemplate.java:643) ~[camel-core-2.14.1.jar:2.14.1]
              at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_111]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_111]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_111]
              ... 1 more
      Caused by: org.apache.camel.ExchangeTimedOutException: The OUT message was not received within: 60000 millis due reply message with correlationID: Camel-ID-jw-dev-1-44009-1483717755100-0-124 not received. Exchange[Message: [Body is not logged]]
              at org.apache.camel.component.jms.reply.ReplyManagerSupport.processReply(ReplyManagerSupport.java:133) ~[camel-jms-2.13.1.jar:2.13.1]
              at org.apache.camel.component.jms.reply.TemporaryQueueReplyHandler.onTimeout(TemporaryQueueReplyHandler.java:61) ~[camel-jms-2.13.1.jar:2.13.1]
              at org.apache.camel.component.jms.reply.CorrelationTimeoutMap.onEviction(CorrelationTimeoutMap.java:53) ~[camel-jms-2.13.1.jar:2.13.1]
              at org.apache.camel.component.jms.reply.CorrelationTimeoutMap.onEviction(CorrelationTimeoutMap.java:30) ~[camel-jms-2.13.1.jar:2.13.1]
              at org.apache.camel.support.DefaultTimeoutMap.purge(DefaultTimeoutMap.java:212) ~[camel-core-2.14.1.jar:2.14.1]
              at org.apache.camel.support.DefaultTimeoutMap.run(DefaultTimeoutMap.java:162) ~[camel-core-2.14.1.jar:2.14.1]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_111]
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[?:1.8.0_111]
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_111]
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[?:1.8.0_111]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_111]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_111]
              ... 1 more
      2017-01-06 10:50:30,645 DEBUG [Collectd-Thread-30-of-50] o.o.n.c.CollectableService: run: change in collection status, generating event.
      

      We should prevent alarms from being triggered in this case, and return a status/exception indicating that the collection could be not executed.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                j-white Jesse White
                Reporter:
                j-white Jesse White
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: