Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-14321

Kafka-Producer Alarm Resync Failing Post Entire Kafka Cluster Outage



    • Horizon - May 26 - June 9
    • 1061



      When Kafka-producer feature is enabled and if entire kafka-cluster is brought down and when alarms are generated during kafka-outages, those alarms do not get re-synced back to alarm-topic.
      The alarms generated during kafka outage can be seen in OpenNMS as active but when kafka cluster is brought backup it does not resync the active alarms.. more information below

      I've setup OpenNMS and Minions over ActiveMQ and enabled Kafka-producer feature on OpenNMS to send alarms/events/node info to kafka same setup as client.
      OpenNMS Version: 2019.1.2 


      Steps Performed:

      • Stopped Kafka Cluster for 2 Hours
      • Generated some alarms 1Hour into Kafka Outage, noticed those alarms come into OpenNMS.
      • When Kafka Outage was resolved [post 2 hours], below is what was noticed:
        • The alarms which were generated 1 Hour into Kafka Outage were still active on OpenNMS, but those never got sent to alarms topic, post kafka-outage was resolved
        • I did a manual sync-alarms from karaf, that too dint work got below messages on sync-alarms "Reduction Keys added to Ktable"..
        • But those messages were for every sync-alarms.. ideally i think once the reduction keys are added to the Ktable and if there is not state change to the alarm there is no need to keep adding the same reduction keys.. either it is unable to add them or something else is happening [please correct me here]

      Also, noticed when I do "list-alarms" from karaf, it shows only 2 Alarms/Reduction Keys but there are 4 alarms in the Alarms GUI and DB..
      Why the mis-match here ? Could this be the reason the other 2 Active reduction-keys/alarms are not getting sent to alarms topic during "sync-alarms"

      Note: I've not added"alarmSync" in org.opennms.features.kafka.producer.cfg, and the default is set to True, so running on default setup.

      Running "list-alarms" from Karaf

      List of Active Alarms from GUI


      Running Manual "sync-alarms" from Karaf


      I feel this behavior is a BUG, since technically the Alarms should get resynced from OpenNMS DB when kafka-outage has been resolved.





            amay Alex May
            Sriraag Sridhar Sriraag Sridhar
            0 Vote for this issue
            5 Start watching this issue