Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-8751

Poller Node Down without outages

    Details

      Description

      There are several situation in which the poller is not able to
      properly detect the status of a node that is down.

      First:

      a) add SNMP and ICMP to nodeA but only set up active packages for SNMP
      b) node A went down (only an outage for SNMP is opened)
      c) configure opennms for polling nodeA/ICMP
      d) restart opennms
      e) no outage is created for ICMP while the node is still down

      Second
      a) Provide a node with ICMP and SNMP
      b) the node is down
      c) import the node into the database...
      d) if categorymembership event is received before nodeGainedService events
      two outages are created for ICMP and SNMP
      e) if categorymembership event is received after nodeGainedService events
      an outage is created only for SNMP

      Both errors are related to the fact that when scheduling a node
      if the node has outage without a event associated the outage is resolved.

      So the outage in both cases is resolved but the coherence of the
      Poller node is not set.

      If I resolve and outage then the status of the Node in DefaultPollContext should not be Down, must be up.

      The first situation is related to the fact that when initing the new service ICMP is scheduled and the service is made Up by default.
      Always on Init poller try to inherit the status of the node, and because the node is Down (Well has an outage with a cause that is nodeDown event)....the Poller will propagate the status to the monitored services and interfaces.
      Also ICMP service is put to down...but there is no outage associated with it...because the ICMP was never poller. The solution is to create an outage if the previuos state of the service is up.

      The second problem is related to the fact that there is a delay for updating the outage with the eventid generating it.
      With this in mind happens that rescheduling (triggered by categoryMeberShipChange)(that also clear the outage without a valid eventid) will clear the outage that has been created because the outage has not yet been update properly with the eventid. So also here the situation is the node is down but the outage has been cleared.

      In both cases because the poller acts only if there is a status change the effect is that we have node Down without outages associated

        Attachments

          Activity

            People

            • Assignee:
              rssntn67 Antonio Russo
              Reporter:
              rssntn67 Antonio Russo
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: