Cancel dialog is slow and Topology Map crashes if vertex has many edges (20+)

Description

OpenNMS is crashing whenever we are using the topology and freezing completely in this view.

After a while a message can be displayed, as per screenshot, but only way to recover is to close the browser and restart it again.

Acceptance / Success Criteria

None

Attachments

3

Lucidchart Diagrams

Activity

Show:

Markus von Rüden June 28, 2016 at 9:25 AM
Edited

I investigated the problem described here.
I found the following:

  • Once the BSM Admin page has loaded, clicking the "refresh" button takes several minutes. The refresh operation is also executed when leaving a dialog. This can be observed in the provided video. This issue has been fixed. The refresh operation now takes ~500ms with the provided database dump.

  • In the Topology UI, selecting the Business Service "HFR892094_APs" leads to an Out of Memory Exception (it takes a while). The problem is caused because the service has 51 reduction keys assigned. For each selected Business Service all impacting Vertices are calculated (see GraphAlgorithms.calculateImpacting(Graph, GraphVertex)). In order to do so, all combinations of the 51 edges are calculated. This is a 51! (n!) operation and takes up all memory and cpu usage. For now, I disabled the calucation of the impacting vertices for more than 10 edges and created as a follow up issue.

The pull request can be found here: https://github.com/OpenNMS/opennms/pull/881

Ronny Trommer June 21, 2016 at 10:39 AM

The "GC overhead limit exceed" indicates the garbage collector is running all the time and Java program is making very slow progress. Necessary action is to increase the heap size for the JVM.

Ronny Trommer June 21, 2016 at 10:34 AM

I've seen in the logs the problem occurred today as well

Pedro Silva June 20, 2016 at 1:07 PM
Edited

Just pinging you to let you know, there was another crash when in topology, while pressing the arrow (the bottom right up and down buttons) to show more layers.

The loading animation in red went on, non-stop until eventually a communication loss popped up with a “click here to continue”.

In attachment also follows the retrieved error page , opennms_error.doc

When checking last entry on output.log:

Adding outbound email with several screenshots and notes/verifications.

thank you
Best regards
Pedro

Fixed

Details

Assignee

Reporter

Labels

Components

Sprint

Fix versions

Affects versions

Priority

PagerDuty

Created June 9, 2016 at 10:55 AM
Updated June 30, 2016 at 7:23 AM
Resolved June 30, 2016 at 3:23 AM