Meassure and improve performance of Interface loading and mapping

Description

A potential bottleneck is when there are cases where there are a lot of IP and/or SNMP interfaces.

The following code blocks:
https://github.com/OpenNMS/opennms/blob/240444e85630249e28290e931a47368e2e2b3cd3/features/topology-map/plugins/org.opennms.features.topology.plugins.topo.linkd/src/main/java/org/opennms/features/topology/plugins/topo/linkd/internal/LinkdTopologyProvider.java#L863

https://github.com/OpenNMS/opennms/blob/240444e85630249e28290e931a47368e2e2b3cd3/features/topology-map/plugins/org.opennms.features.topology.plugins.topo.linkd/src/main/java/org/opennms/features/topology/plugins/topo/linkd/internal/LinkdTopologyProvider.java#L889

might be slow if the dao calls return a lot of objects

Step 1: enhance Topology Generator to be able to generate interfaces: https://github.com/opennms-forge/opennms-topology-generator
Step 2: measure potential bottleneck
Step 3: evaluate and discuss possible solutions if needed
Step 4: implement improvement

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Patrick Schweizer December 1, 2018 at 1:06 AM

I implemented a similar strategy to the other XXTopologyEntitys, it shows a significant improvement:

a random cdp topology with 20000 Nodes, 20000 Elements, 20000 Links, 360000 SnmpInterfaces, 40000 IpInterfaces:
not improved:
ipinterfaces took 572ms for 40000 interfaces
snmpinterfaces took 7803ms for 360000 interfaces
improved:
ipinterfaces took 299for 40000 interfaces
snmpinterfaces took 1088ms for 360000 interfaces

PR: https://github.com/OpenNMS/opennms/pull/2275

Patrick Schweizer November 29, 2018 at 4:23 PM

I took another good look at the code and it is unfortunately not as simple as it seemed to me at first. The loops through the interfaces are also used to determine other attributes such as ipAddress, targetIfIndex, targetIfName... (see below for my notes).
What we could do however is to try to apply the same principle as we did for the nodes and links. If we are worried that this is too much in memory we can also skip the cache and just optimize the reading via the lightweight objects. I will try that and see what impact it makes.

Based on stats I ran another test with a random cdp topology with 20000 Nodes, 20000 Elements, 20000 Links, 360000 SnmpInterfaces, 40000 IpInterfaces:
ipinterfaces took 572ms for 40000 interfaces
snmpinterfaces took 7803ms for 360000 interfaces

Notes

m_ipInterfaceDao.findAll()
- fill m_nodeToOnmsIpPrimaryMap => Vertex: {tooltip: managed; ipAddress;}
- ipToOnmsIpMap => used to fill m_macToNodeXXMap

m_snmpInterfaceDao.findAll()
- fill m_nodeToOnmsSnmpMap => used to fill m_macToOnmsSnmpMap; Edge: {targetIfIndex; targetIfName; speed}

m_ipNetToMediaDao.findAll()
- fill m_macToNodeidMap => Vertex: {protocolSupported: [ProtocolSupported.BRIDGE]}; connectVertices()
- fill m_macToOnmsSnmpMap => Vertex: {protocolSupported: [ProtocolSupported.BRIDGE]}; connectVertices()
- fill m_macToOnmsIpMap => Vertex: {protocolSupported: [ProtocolSupported.BRIDGE]}; connectVertices()

Markus von Rüden November 26, 2018 at 10:50 AM

I also agree. If it is not used, rip it out (-:

If we were to lazy-load the tooltip, each lookup must be in a transaction, and will probably make everything slower rather than faster, unless we can open only one transaction for all lookups. Maybe some kind of `Supplier<String> tooltipSupplier` thingy.

Jesse White November 25, 2018 at 11:53 PM

in terms of element counts, per https://stats.opennms.org/ there are on average 1.8 IP interfaces per node and 18 interfaces per node. So for 50k nodes, we would expect about 100k IP interfaces, and 1m SNMP interfaces.

For #1 - I agree, if it's not used anywhere, let's remove it.
For #2 - making the tooltips lazy-load would be a great solution, if we can find a way to make that work.

Patrick Schweizer November 25, 2018 at 12:24 PM
Edited

Step: 2 & 3
I created a random cdp topology with 50000 Nodes, 50000 Elements, 100000 Links, 50000 SnmpInterfaces, 50000 IpInterfaces and measured the time it takes to load and map the interfaces:

The results are:
ipinterfaces took 1602 ms for 50000 interfaces
snmpinterfaces took 2046 ms for 50000 interfaces

=> It seems we can save a couple of seconds by applying the same logic as we did for the nodes and links.

I also checked for what the information is used:

It seems OnmsIpInterface is used to determined the attribute managed for LinkdVertex
It seems OnmsSnmpInterface is used to determine the attribute speed for LinkdEdge

The question arises if we really need to precompute all these attributes or if we could wait until the attributes are actually needed.

It seems LinkdVertex.getManaged() is never called (at least InteliJ didn't give me any calls).
LinkdEdge.getSpeed is used in LinkdEdge.getTooltipText()

My suggestion would be to:

remove the computation of OnmsIpInterfaces
I am not sure if there is an easy way to lazy load the tooltip and thus be able to remove the precomputation of OnmsSnmpInterface. Is also questionable if we can gain much with that improvement since the overall loading and computation time doesn't seem that high

, what are your thoughts on this?

Fixed

Details
Assignee
Patrick Schweizer
Reporter
Patrick Schweizer
Sprint
None
Fix versions
23.0.2
Meridian-2018.1.4
24.0.0
Priority
Major
Parent
NMS-10369 Performance problems with the Topology Map on large networks

PagerDuty

Created November 19, 2018 at 10:01 PM

Updated December 16, 2018 at 3:51 AM

Resolved December 16, 2018 at 3:51 AM

Meassure and improve performance of Interface loading and mapping

Description

Acceptance / Success Criteria

Lucidchart Diagrams

Activity

Patrick Schweizer December 1, 2018 at 1:06 AM

Patrick Schweizer November 29, 2018 at 4:23 PM

Markus von Rüden November 26, 2018 at 10:50 AM

Jesse White November 25, 2018 at 11:53 PM

Patrick Schweizer November 25, 2018 at 12:24 PMEdited

DetailsAssigneePatrick SchweizerPatrick SchweizerReporterPatrick SchweizerPatrick SchweizerSprintNone+11Fix versions23.0.2Meridian-2018.1.424.0.0PriorityMajorParentNMS-10369 Performance problems with the Topology Map on large networks

Details

Assignee

Reporter

Sprint

Fix versions

Priority

Parent

PagerDutyPagerDuty Incident

PagerDuty

Patrick Schweizer November 25, 2018 at 12:24 PM
Edited

Details
Assignee
Patrick Schweizer
Reporter
Patrick Schweizer
Sprint
None
Fix versions
23.0.2
Meridian-2018.1.4
24.0.0
Priority
Major
Parent
NMS-10369 Performance problems with the Topology Map on large networks

PagerDuty