Details
-
Bug
-
Status: Resolved (View Workflow)
-
Critical
-
Resolution: Fixed
-
24.1.2
-
Security Level: Default (Default Security Scheme)
-
None
-
Seen on CentOS 7, but probably distribution independent
-
Horizon 2019 - September 11th
Description
Steps to reproduce:
1) Start OpenNMS
2) Find the actual PID of the OpenNMS JVM via }}{{ps command:
[vagrant@horizon-24-1-2 opennms]$ ps -ef | grep 'java.*opennms.*bootstra[p]' root 7021 7019 3 16:15 ? 00:03:59 /usr/lib/jvm/java-1.8.0-openjdk/bin/java -Djava.endorsed.dirs=/opt/opennms/lib/endorsed -Dopennms.home=/opt/opennms -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.login.config=opennms -Dcom.sun.management.jmxremote.access.file=/opt/opennms/etc/jmxremote.access -DisThreadContextMapInheritable=true -Dgroovy.use.classvalue=true -XX:MaxMetaspaceSize=512m -Djava.io.tmpdir=/opt/opennms/data/tmp -XX:+StartAttachListener -jar /opt/opennms/lib/opennms_bootstrap.jar start
3) Compare against the value in /var/log/opennms/opennms.pid:
[vagrant@horizon-24-1-2 opennms]$ cat /var/log/opennms/opennms.pid 7019
Expected result: PID in file is same as PID in column 2 in output of (2) — 7021 in this case
Actual result: PID differs
Notes:
- The value in the PID file seems to be consistently equal to the parent PID, i.e. the PID of the OPENNMS_HOME/bin/opennms shell script
- The value in karaf.pid is the correct value that should be in opennms.pid
- I think the introduction of the runCmd function in the start / stop script (commit 24221ac2d0d92c839878209e328477fec4265c76) is the proximate cause, having changed the semantics of the "last background process"
- This bug doesn't seem to break the control script, perhaps because it uses the Attach API, but it does break tools like generate-opennms-thread-dump which read opennms.pid