Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-10378

Alarm processing is very slow when Kafka producer is enabled and Kafka is unavailable

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 22.0.0
    • Fix Version/s: 23.0.0
    • Component/s: None
    • Security Level: Default (Default Security Scheme)
    • Labels:
      None
    • Sprint:
      Horizon - October 10th 2018

      Description

      When the Kafka producer is enabled but Kafka is unavailable every attempt to push an alarm to the Kafka topic will block for 1 minute by default.

      The call in OpenNMSKafkaProducer:sendRecord() ends up blocking on producer.send() if Kafka metadata cannot be obtained. This blocks ultimately because the Kafka client send() method attempts to get metadata with a default timeout of 1 minute (see http://kafka.apache.org/090/documentation.html "max.block.ms").

      The way I produced this issue is by having a misconfigured "ADVERTISED_HOST" environment variable set for my Kafka container. I suspect there is other ways of reproducing, maybe just simply stopping Kafka would have the same result.

      The alarms will eventually get processed after 1 minute of waiting each serially.

      One potential fix would be to change the call to sendRecord so that it pushes a record to a bounded queue and have a separate thread sending records from that queue to Kafka so the OpenNMS alarmd thread is never blocked.

        Attachments

          Activity

            People

            • Assignee:
              mbrooks Matthew Brooks
              Reporter:
              mbrooks Matthew Brooks
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: