When the Kafka producer is enabled but Kafka is unavailable every attempt to push an alarm to the Kafka topic will block for 1 minute by default.
The call in OpenNMSKafkaProducer:sendRecord() ends up blocking on producer.send() if Kafka metadata cannot be obtained. This blocks ultimately because the Kafka client send() method attempts to get metadata with a default timeout of 1 minute (see http://kafka.apache.org/090/documentation.html "max.block.ms").
The way I produced this issue is by having a misconfigured "ADVERTISED_HOST" environment variable set for my Kafka container. I suspect there is other ways of reproducing, maybe just simply stopping Kafka would have the same result.
The alarms will eventually get processed after 1 minute of waiting each serially.
One potential fix would be to change the call to sendRecord so that it pushes a record to a bounded queue and have a separate thread sending records from that queue to Kafka so the OpenNMS alarmd thread is never blocked.