We had upgrade our OpenNMS - Installation to 18.0.1. Since this time OpenNMS stops working in a period of about a week and trows an “java.lang.OutOfMemoryError: GC overhead limit exceeded” - error.
Unfortunately the logs say nothing to the cause, only to the effects of the crashes.
We monitor 510 Nodes and had set the Java heap size to 4048 so it couldn’t be a problem with not enough or too much heap space (we tested this).
To trace the error we deactivate service by service and I analyses a heap dump of a crash with the eclipse Memory Analyzer. It found two suspects for a memory leak and both point to
org.opennms.netmgt.model.BridgeMacLink objects (see appended pdf).
A look in the database shows 54 entries in bridgestplink and 0 in bridgebridgelink and bridbridgelink.
After deactivating the bridge-discovery in enlinkd-configuration.xml OpenNMS stops crashing.
I append to screen shots of the resource graph ‘System Memory Stats’ of the OpenNMS server, because there you can see the increasing of the used buffer with the discovery of the bridge information and the relative constant value after deactivating it.