Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-12685

Improve health-check to be more aligned with Kubernetes Probes

    XMLWordPrintable

    Details

    • Type: Enhancement
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 26.0.0
    • Fix Version/s: Next
    • Component/s: Minion, Sentinel
    • Security Level: Default (Default Security Scheme)
    • Labels:
      None
    • HB Backlog Status:
      Backlog

      Description

      When using Kubernetes, it is essential to define readiness probes (that tells when the application is ready to start receiving requests), and liveness probes (that tells if the application is running), as described here:

      https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

      The intension of these probes is to check all the internal features of the application in question and tell if everything is running well or not.

      If the application relies on external dependencies, the health check should not cover or be affected for the availability of those external dependencies, as those should have their own health checks.

      That is important because if a liveness probe fails because Kafka, for example, is not running, then the scheduler will kill the Pod and restart it, which of course, causes an interruption of the service for no reason.

      Currently, the health-check for Minion and Sentinel verify if they can connect to the OpenNMS ReST API and also if they can connect to the Broker (regardless of the implementation). That is a very good use case for operators who want to know if everything is configured and running. But, for Kubernetes, Docker, Docker Swarm, and another container orchestrator, that implementation is not useful and can lead to undesired side effects.

      I propose to enhance the health-check implementation to offer a flag to perform only local checks. Or, to verify if the internal components or features are running correctly, regardless of the status of the external dependencies so that we can use it as a readiness or liveness probe in Kubernetes (or the chosen container orchestrator).

      For instance, if we pass "--local", to the health-check commands, that could verify the internal components only. If we don't, you'll get the current behavior (which includes checking external dependencies).

      That would make both worlds happy (the orchestrator, and a user who wants to see if Minion or Sentinel is running fine).

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              agalue Alejandro Galue
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                HB Grooming Date:

                  Git Integration