Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-5710

HostResourceSwRunMonitor doesn't work well with processes like cron (with many forks)



    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 1.11.3, 1.10.7
    • 1.10.9
    • Security Level: Default (Default Security Scheme)
    • None


      Rod Ormon from the users list provided a very good explanation about the problem:

      The logic in the HostResourceSwRunMonitor.java code is somewhat flawed.

      If you check the code, you will see that TWO separate polls are made to the SNMP agent of the server using the hrSWRun Table. First to the hrSWRunName and then a second SNMP get to the hrSWRunStatus. Then the hrSWRunName values are checked for the service-name that we are looking for and, if found, the corresponding status is check in the hrSWRunStatus results.

      But, because, the poller makes TWO separate SNMP gets to the hrSWRun Table the results "may" be different (and as it turns out, it is often different!) A service that is returned as running in the Name list may no longer be running when the SNMP get occurs against the Status list. And as a result I frequently see a fail with the STATUS=NULL ... which of course would be correct because there is no Status value in the Status list - process had ended by the time the second SNMP get was made.

      This is especially noticeable for processes like crond because it forks so many other crond processes. The forked process "may" be there in the first poll (hrSWRunName) but the forked process is short-lived and is not there in the second poll (hrSWRunStatus.)

      I have had confirmation of this behaviour from OpenNMS Support.




            agalue Alejandro Galue
            agalue Alejandro Galue
            0 Vote for this issue
            1 Start watching this issue