Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-14321

Kafka-Producer Alarm Resync Failing Post Entire Kafka Cluster Outage

    XMLWordPrintable

Details

    • Horizon - May 26 - June 9
    • 1061

    Description

      Issue:

      When Kafka-producer feature is enabled and if entire kafka-cluster is brought down and when alarms are generated during kafka-outages, those alarms do not get re-synced back to alarm-topic.
      The alarms generated during kafka outage can be seen in OpenNMS as active but when kafka cluster is brought backup it does not resync the active alarms.. more information below

      Setup:
      I've setup OpenNMS and Minions over ActiveMQ and enabled Kafka-producer feature on OpenNMS to send alarms/events/node info to kafka same setup as client.
      OpenNMS Version: 2019.1.2 

       

      Steps Performed:

      • Stopped Kafka Cluster for 2 Hours
      • Generated some alarms 1Hour into Kafka Outage, noticed those alarms come into OpenNMS.
      • When Kafka Outage was resolved [post 2 hours], below is what was noticed:
        • The alarms which were generated 1 Hour into Kafka Outage were still active on OpenNMS, but those never got sent to alarms topic, post kafka-outage was resolved
        • I did a manual sync-alarms from karaf, that too dint work got below messages on sync-alarms "Reduction Keys added to Ktable"..
        • But those messages were for every sync-alarms.. ideally i think once the reduction keys are added to the Ktable and if there is not state change to the alarm there is no need to keep adding the same reduction keys.. either it is unable to add them or something else is happening [please correct me here]

      Also, noticed when I do "list-alarms" from karaf, it shows only 2 Alarms/Reduction Keys but there are 4 alarms in the Alarms GUI and DB..
      Why the mis-match here ? Could this be the reason the other 2 Active reduction-keys/alarms are not getting sent to alarms topic during "sync-alarms"

      Note: I've not added"alarmSync" in org.opennms.features.kafka.producer.cfg, and the default is set to True, so running on default setup.

      Running "list-alarms" from Karaf

      List of Active Alarms from GUI

       

      Running Manual "sync-alarms" from Karaf

       

      I feel this behavior is a BUG, since technically the Alarms should get resynced from OpenNMS DB when kafka-outage has been resolved.

       

      Attachments

        Activity

          People

            amay Alex May
            Sriraag Sridhar Sriraag Sridhar
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: