Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-7231

The downtime model never removes the nodes when it is instructed to do it




      Typically, the default downtime model for Pollerd is configured to remove a node that has been down for more than 5 days.

      Of course, this option makes more sense for discovered nodes than requisitioned nodes.

      I've created a simple testing environment on a VM running 14.0.1 with the following changes on the configuration:

      1) Change the polling frequency to be 30 seconds for all the services in poller-configuration.xml

      2) Remove all the downtime model entries and add the following:

      <downtime interval="30000" begin="0" end="300000" /><!-- from 0 to 5 minutes, poll every 30 seconds -->
      <downtime begin="300000" delete="true" /><!-- delete after 5 minutes -->

      BTW, a "valid" downtime model must start with 0 (i.e. begin=“0”). It could have several entries in the middle (with begin/end/interval where the “begin” attribute of each entry should match the “end” attribute of the entry above of it), and then the last one should be either delete the node (like the example), or just continue checking the service at certain interval (i.e., an entry with “begin” and “interval” no “end” or “delete”). If this is not correct, the downtime model will be rejected and ignored.

      3) Start OpenNMS.

      4) Add a new through the newSuspect event (i.e. discover a node). I've used another VM as a target node.

      5) Wait a few minutes to verify that the node is being monitored properly.

      6) Stop the VM that is being monitored.

      7) Wait more than 5 minutes.

      Expected result:

      The node should be removed automatically from the database.

      Current result:

      The node is still on the DB after 15 minutes (it is never removed, or marked to be removed). But, all the monitored services have been removed as part of the downtime model, but the empty IP interface and the node itself are never removed from the DB (check the screenshot).

      In other words, it is partially working.

      Also, the time between the nodeDown and the service deletion is more than 5 minutes for some services which is not expected as well. I mean, some services are requested to be removed 5 minutes after the nodeDown, but the rest of them are requested to be removed 10 minutes after the nodeDown.




            ranger Benjamin Reed
            agalue Alejandro Galue
            0 Vote for this issue
            2 Start watching this issue