Meassure and improve performance of Interface loading and mapping
Description
Acceptance / Success Criteria
Lucidchart Diagrams
Activity

Patrick Schweizer December 1, 2018 at 1:06 AM
I implemented a similar strategy to the other XXTopologyEntitys, it shows a significant improvement:
a random cdp topology with 20000 Nodes, 20000 Elements, 20000 Links, 360000 SnmpInterfaces, 40000 IpInterfaces:
not improved:
ipinterfaces took 572ms for 40000 interfaces
snmpinterfaces took 7803ms for 360000 interfaces
improved:
ipinterfaces took 299for 40000 interfaces
snmpinterfaces took 1088ms for 360000 interfaces

Patrick Schweizer November 29, 2018 at 4:23 PM
I took another good look at the code and it is unfortunately not as simple as it seemed to me at first. The loops through the interfaces are also used to determine other attributes such as ipAddress, targetIfIndex, targetIfName... (see below for my notes).
What we could do however is to try to apply the same principle as we did for the nodes and links. If we are worried that this is too much in memory we can also skip the cache and just optimize the reading via the lightweight objects. I will try that and see what impact it makes.
Based on stats I ran another test with a random cdp topology with 20000 Nodes, 20000 Elements, 20000 Links, 360000 SnmpInterfaces, 40000 IpInterfaces:
ipinterfaces took 572ms for 40000 interfaces
snmpinterfaces took 7803ms for 360000 interfaces
Notes
m_ipInterfaceDao.findAll()
fill m_nodeToOnmsIpPrimaryMap => Vertex: {tooltip: managed; ipAddress;}
ipToOnmsIpMap => used to fill m_macToNodeXXMap
m_snmpInterfaceDao.findAll()
fill m_nodeToOnmsSnmpMap => used to fill m_macToOnmsSnmpMap; Edge: {targetIfIndex; targetIfName; speed}
m_ipNetToMediaDao.findAll()
fill m_macToNodeidMap => Vertex: {protocolSupported: [ProtocolSupported.BRIDGE]}; connectVertices()
fill m_macToOnmsSnmpMap => Vertex: {protocolSupported: [ProtocolSupported.BRIDGE]}; connectVertices()
fill m_macToOnmsIpMap => Vertex: {protocolSupported: [ProtocolSupported.BRIDGE]}; connectVertices()

Markus von Rüden November 26, 2018 at 10:50 AM
I also agree. If it is not used, rip it out (-:
If we were to lazy-load the tooltip, each lookup must be in a transaction, and will probably make everything slower rather than faster, unless we can open only one transaction for all lookups. Maybe some kind of `Supplier<String> tooltipSupplier` thingy.

Jesse White November 25, 2018 at 11:53 PM
in terms of element counts, per https://stats.opennms.org/ there are on average 1.8 IP interfaces per node and 18 interfaces per node. So for 50k nodes, we would expect about 100k IP interfaces, and 1m SNMP interfaces.
For #1 - I agree, if it's not used anywhere, let's remove it.
For #2 - making the tooltips lazy-load would be a great solution, if we can find a way to make that work.

Patrick Schweizer November 25, 2018 at 12:24 PMEdited
Step: 2 & 3
I created a random cdp topology with 50000 Nodes, 50000 Elements, 100000 Links, 50000 SnmpInterfaces, 50000 IpInterfaces and measured the time it takes to load and map the interfaces:
The results are:
ipinterfaces took 1602 ms for 50000 interfaces
snmpinterfaces took 2046 ms for 50000 interfaces
=> It seems we can save a couple of seconds by applying the same logic as we did for the nodes and links.
I also checked for what the information is used:
It seems OnmsIpInterface is used to determined the attribute managed for LinkdVertex
It seems OnmsSnmpInterface is used to determine the attribute speed for LinkdEdge
The question arises if we really need to precompute all these attributes or if we could wait until the attributes are actually needed.
It seems LinkdVertex.getManaged() is never called (at least InteliJ didn't give me any calls).
LinkdEdge.getSpeed is used in LinkdEdge.getTooltipText()
My suggestion would be to:
remove the computation of OnmsIpInterfaces
I am not sure if there is an easy way to lazy load the tooltip and thus be able to remove the precomputation of OnmsSnmpInterface. Is also questionable if we can gain much with that improvement since the overall loading and computation time doesn't seem that high
, what are your thoughts on this?
Details
Assignee
Patrick SchweizerPatrick SchweizerReporter
Patrick SchweizerPatrick SchweizerSprint
NoneFix versions
Priority
Major
Details
Details
Assignee

Reporter

Sprint
Fix versions
Priority
PagerDuty
PagerDuty Incident
PagerDuty
PagerDuty Incident
PagerDuty

A potential bottleneck is when there are cases where there are a lot of IP and/or SNMP interfaces.
The following code blocks:
https://github.com/OpenNMS/opennms/blob/240444e85630249e28290e931a47368e2e2b3cd3/features/topology-map/plugins/org.opennms.features.topology.plugins.topo.linkd/src/main/java/org/opennms/features/topology/plugins/topo/linkd/internal/LinkdTopologyProvider.java#L863
https://github.com/OpenNMS/opennms/blob/240444e85630249e28290e931a47368e2e2b3cd3/features/topology-map/plugins/org.opennms.features.topology.plugins.topo.linkd/src/main/java/org/opennms/features/topology/plugins/topo/linkd/internal/LinkdTopologyProvider.java#L889
might be slow if the dao calls return a lot of objects
Step 1: enhance Topology Generator to be able to generate interfaces: https://github.com/opennms-forge/opennms-topology-generator
Step 2: measure potential bottleneck
Step 3: evaluate and discuss possible solutions if needed
Step 4: implement improvement