Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-5266

File based Provisioning Groups nodes lose historic Service Outage information after manual Synchronization for services added with detectors.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.10.1
    • Fix Version/s: 14.0.0
    • Security Level: Default (Default Security Scheme)
    • Environment:

      Description

      Adding this report on suggestion from Jeff on IRC. Looking for a way to keep historical node service outages for the setup I have if possible. Don't know whether this is a bug. If not looking for alternative method for provisioning that will keep historical Service Outages for nodes.

      Trying to look back on Outage history for a customer, we discovered that only recent outages were available. This seemed strange so I checked the DB Events Table and vacuumd-configuration.xml to see if the events maintenance were deleting them, which didn't seem the case as some Events linked to Outages went back to this Server's inception (8 Months previous) and the default events maintenance in vacuumd-configuration.xml doesn't delete events linked to outages. Checked the Outages Table and again there were Outages going back to inception and 6136 entries in total. Noticed the Postgres Statistics for the Outage Table reported that there were 76629 entries 75606 updates and 75351 deletions. Not sure if this means anything really but I didn't think there should be that many Outage deletions (I thought they remain unless a node is deleted) and it made me thing about a curious thing I had been noticing but not overly worrying about. That is when adding, updating or deleting nodes in a Provisioning Group, some nodes belonging to the same Provisioning Group that were reporting service outages on the main page cleared after manually synchronizing the DB.

      I decided to check this further using a Provisioning Group we have called ProductionServers. There were 5 servers in that group that had historic service outages. I ran an SQL query for each nodes id in the Outage Table to list the Outages. Then I added in a new Node to the Provisioning Group, and manually synchronized the group. All the Service Outages for those nodes were removed from the Database except for an SNMPMonitor service on one that was explicitly provisioned in the nodes Interface. MailQ in the case of beaker.airspeed.ie show below.

      All other services are discovered using Detectors for the Provisioning Group. Here are the Provisioning Groups foreign-source & imports:

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
      <foreign-source date-stamp="2012-01-25T09:19:35.459Z" name="ProductionServers" xmlns="http://xmlns.opennms.org/xsd/config/foreign-source">
      <scan-interval>1d</scan-interval>
      <detectors>
      <detector class="org.opennms.netmgt.provision.detector.snmp.SnmpDetector" name="SNMP"/>
      <detector class="org.opennms.netmgt.provision.detector.icmp.IcmpDetector" name="ICMP">
      <parameter value="2" key="retries"/>
      <parameter value="2000" key="timeout"/>
      </detector>
      <detector class="org.opennms.netmgt.provision.detector.jmx.Jsr160Detector" name="OpenNMS-JVM">
      <parameter value="18980" key="port"/>
      <parameter value="rmi" key="protocol"/>
      <parameter value="/jmxrmi" key="urlPath"/>
      <parameter value="3000" key="timeout"/>
      <parameter value="2" key="retries"/>
      <parameter value="default" key="type"/>
      </detector>
      <detector class="org.opennms.netmgt.provision.detector.simple.HttpDetector" name="HTTP">
      <parameter value="80" key="port"/>
      <parameter value="3000" key="timeout"/>
      </detector>
      <detector class="org.opennms.netmgt.provision.detector.simple.HttpsDetector" name="HTTPS">
      <parameter value="443" key="port"/>
      <parameter value="4000" key="timeout"/>
      </detector>
      <detector class="org.opennms.netmgt.provision.detector.datagram.DnsDetector" name="DNS">
      <parameter value="53" key="port"/>
      <parameter value="3000" key="timeout"/>
      <parameter value="localhost" key="lookup"/>
      </detector>
      <detector class="org.opennms.protocols.dhcp.detector.DhcpDetector" name="DHCP"/>
      <detector class="org.opennms.netmgt.provision.detector.datagram.NtpDetector" name="NTP">
      <parameter value="123" key="port"/>
      <parameter value="3000" key="timeout"/>
      </detector>
      <detector class="org.opennms.netmgt.provision.detector.simple.SmtpDetector" name="SMTP">
      <parameter value="25" key="port"/>
      <parameter value="3000" key="timeout"/>
      </detector>
      <detector class="org.opennms.netmgt.provision.detector.simple.ImapDetector" name="IMAP">
      <parameter value="143" key="port"/>
      <parameter value="3000" key="timeout"/>
      </detector>
      <detector class="org.opennms.netmgt.provision.detector.ssh.SshDetector" name="SSH">
      <parameter value="SSH" key="banner"/>
      <parameter value="22" key="port"/>
      <parameter value="3000" key="timeout"/>
      </detector>
      <detector class="org.opennms.protocols.radius.detector.RadiusAuthDetector" name="RADIUS">
      <parameter value="1812" key="authPort"/>
      <parameter value="chap" key="authType"/>
      <parameter value="airspeed2" key="password"/>
      <parameter value="airspeed2" key="user"/>
      <parameter value="2000" key="timeout"/>
      </detector>
      </detectors>
      <policies>
      <policy class="org.opennms.netmgt.provision.persist.policies.NodeCategorySettingPolicy" name="Hughes AB EMS">
      <parameter value="HughesEMS" key="category"/>
      <parameter value="ALL_PARAMETERS" key="matchBehavior"/>
      <parameter value="~^\.1\.3\.6\.1\.4\.1\.303\.3\.3\.16\.4" key="sysObjectId"/>
      </policy>
      <policy class="org.opennms.netmgt.provision.persist.policies.NodeCategorySettingPolicy" name="Linux Net SNMP">
      <parameter value="Linux" key="category"/>
      <parameter value="ALL_PARAMETERS" key="matchBehavior"/>
      <parameter value="~^\.1\.3\.6\.1\.4\.1\.8072\.3\.2\.10" key="sysObjectId"/>
      </policy>
      <policy class="org.opennms.netmgt.provision.persist.policies.NodeCategorySettingPolicy" name="Sun Fire">
      <parameter value="Sun" key="category"/>
      <parameter value="ALL_PARAMETERS" key="matchBehavior"/>
      <parameter value="~^\.1\.3\.6\.1\.4\.1\.303\.3\.3\.16\.4" key="sysObjectId"/>
      </policy>
      <policy class="org.opennms.netmgt.provision.persist.policies.NodeCategorySettingPolicy" name="Default Policy">
      <parameter value="Server" key="category"/>
      <parameter value="ALL_PARAMETERS" key="matchBehavior"/>
      </policy>
      <policy class="org.opennms.netmgt.provision.persist.policies.NodeCategorySettingPolicy" name="Allot Netenforcer">
      <parameter value="Allot" key="category"/>
      <parameter value="ALL_PARAMETERS" key="matchBehavior"/>
      <parameter value="~^\.1\.3\.6\.1\.4\.1\.2603" key="sysObjectId"/>
      </policy>
      <policy class="org.opennms.netmgt.provision.persist.policies.MatchingSnmpInterfacePolicy" name="Collect Data ifType6">
      <parameter value="ENABLE_COLLECTION" key="action"/>
      <parameter value="ALL_PARAMETERS" key="matchBehavior"/>
      </policy>
      </policies>
      </foreign-source>

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
      <model-import last-import="2012-03-23T15:12:01.113Z" foreign-source="ProductionServers" date-stamp="2012-03-23T15:12:01.072Z" xmlns="http://xmlns.opennms.org/xsd/config/model-import">
      <node node-label="nwmanager.airspeed.ie" foreign-id="1332515455875" building="ProductionServers">
      <interface status="1" snmp-primary="P" ip-addr="77.75.103.166" descr="Infocaster Management"/>
      </node>
      <node node-label="manage.airspeed.ie" foreign-id="1329483173406" building="ProductionServers">
      <interface status="1" snmp-primary="P" ip-addr="77.75.103.162" descr=""/>
      </node>
      <node node-label="statler.airspeed.ie" foreign-id="1321699352613" building="ProductionServers">
      <interface status="1" snmp-primary="P" ip-addr="77.75.103.81" descr=""/>
      </node>
      <node node-label="financ.airspeed.ie" foreign-id="1316444783458" building="ProductionServers">
      <interface status="1" snmp-primary="P" ip-addr="77.75.103.165" descr=""/>
      </node>
      <node node-label="waldorf.airspeed.ie" foreign-id="1314962448751" building="ProductionServers">
      <interface status="1" snmp-primary="P" ip-addr="77.75.103.67" descr=""/>
      </node>
      <node node-label="beaker.airspeed.ie" foreign-id="1314962384142" building="ProductionServers">
      <interface status="1" snmp-primary="P" ip-addr="77.75.103.66" descr="">
      <monitored-service service-name="SNMP"/>
      <monitored-service service-name="ICMP"/>
      <monitored-service service-name="MailQ"/>
      </interface>
      </node>
      <node node-label="alvaristar.airspeed.ie" foreign-id="1314962363115" building="ProductionServers">
      <interface status="1" snmp-primary="P" ip-addr="172.16.4.74" descr=""/>
      </node>
      <node node-label="allot.airspeed.ie" foreign-id="1314962320782" building="ProductionServers">
      <interface status="1" snmp-primary="P" ip-addr="172.16.4.30" descr=""/>
      </node>
      <node node-label="aaa.airspeed.ie" foreign-id="1314962299582" building="ProductionServers">
      <interface status="1" snmp-primary="P" ip-addr="77.75.103.91" descr=""/>
      </node>
      <node node-label="nms2.airspeed.ie" foreign-id="1307717670581" building="ProductionServers">
      <interface status="1" snmp-primary="P" ip-addr="127.0.0.1" descr="lo"/>
      </node>
      <node node-label="ems.hughes.airspeed.ie" foreign-id="1307717602203" building="ProductionServers">
      <interface status="1" snmp-primary="P" ip-addr="172.16.4.71" descr="bge0"/>
      </node>
      </model-import>

      Looking at any of the nodes event sequences, I can see that during Synchronization Services are deleted from Interface, then the services are discovered for the primary Interface. When more than one interface, for interfaces discovered from snmp scanning the Services are deleted from the Interface, the Interface is deleted, the Interface is discovered, and the services discovered

      When I run my SQL queries on The Outages table again these nodes, all the Service Outages have been removed except for the MailQ service on beaker.airspeed.ie (this server had no ICMP outages so I cannot say if they would have also stayed or being deleted)

      Unfortunately, as I write this the provisiond.log.* are already overwriting the time period I ran this, but I can run the same sequence again shortly after causing some service outages on the servers first and get the logs for the period. I'm posting some screenshots & sql output from this. If I should be collecting any other Debug Logs please tell.

      Thanks
      P

        Attachments

        1. beaker.csv
          6 kB
        2. beaker.png
          beaker.png
          109 kB
        3. beaker-after.png
          beaker-after.png
          101 kB
        4. emshughes-after.png
          emshughes-after.png
          29 kB
        5. hughesems.csv
          2 kB
        6. hughesems.png
          hughesems.png
          87 kB
        7. pgstats.png
          pgstats.png
          121 kB

          Issue Links

            Activity

              People

              • Assignee:
                ranger Benjamin Reed
                Reporter:
                ptuite@airspeed.ie Patrick Tuite
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: