In the completion of
NMS-13274, minion health is exposed via a ReST API.
This is a great step forward - and its use has been adopted in MAS.
To address scalability concerns, some of the health checks are rather heavyweight:
- Broker health:
- Implemented in a way such that, on every invocation, an instance of the broker client is created. In the case of Kafka, all topics are listed.
- OpenNMS ReST health:
- Results in a ReST request to OpenNMS, which retrieves the SNMPv3 credentials. Per my understanding, this will be going away soon...
Since there can be hundreds to thousands of minions making these calls frequently enough (every 30 seconds), this will add unnecessary load on OpenNMS.
The ask in this story is to add a new lightweight/passive health check for broker and OpenNMS health.
Some thoughts... leverage the existing minion-ONMS heartbeat (if message received within expected time, mark as healthy). This can be used to report ONMS health.
Since MAS reports broker and ONMS separately, it would be nice to have a lightweight check for the broker as well. Perhaps something as simple as checking if the broker client is active? Intent is assessing health for the minion-broker link.