Optimize Performance of Timeseries Integration Layer

Description

The Timeseries Integration Layer is feature complete. However in order to use it in production it must be performant enough.

Opennms installations get around 40.000 datapoints (Metric + timestamp + value) per second.

This task is to examine the bottlenecks of the integration layer and to remove them . The current implementation is based on the Opennms Newts code and optimized for Newts. We can now take this the next step and optimize for the integration layer needs.

Acceptance / Success Criteria

None

Attachments

Subtasks

100% Done

Linked issues

related to

NMS-12759

Optimize Performance of InfluxDb Plugin

Lucidchart Diagrams

Activity

Patrick Schweizer June 5, 2020 at 6:22 PM

I moved the different bottlenecks to subtasks and made this ticket the umbrella for all optimizations.

Patrick Schweizer June 2, 2020 at 7:54 PM
Edited

Talking to ,

the goal is to make sure the integration layer is as performant as possible. The current structure is optimized for Newts / Cassandra, not for a tag based storage. We want to be able to wrote 40k samples / second.

In order to stress test and profile the system we can use the following commands:
One off:
admin@opennms> collect --persist --node 2 org.opennms.netmgt.collectd.SnmpCollector 127.0.0.1
Stress:
admin@opennms> stress-metrics --interfaces 5 --strings 2 --interval 30 --nodes 10000 --threads 10 --groups 5 --attributes 5

Patrick Schweizer June 2, 2020 at 2:40 PM

I looked into the different caches we have in the TimeseriesIntegrationLayer:

SearchableResourceMetadataCache: used byTimeseriesResourceStorageDao. Stores
TimeseriesSearcher.metricsUnderResource: (new) caches all Metrics that can be found under a resource (by a wildcard search). Caches results coming from TimeseriesStorage implementation.
TimeSeriesMetaDataDao.cache: caches all attributes associated with a resourceId. Caches results from the database.

Patrick Schweizer June 1, 2020 at 8:30 PM

TimeseriesMetaDataDao.storeMetadata cached only on reads, not on writes. I fixed that. We should see now a lot less writes I suppose...

Jesse White May 22, 2020 at 7:30 PM

With respect to caching, it appears that there is also room for improvement in the TimeSeriesMetaDataDao. When stressing the system with metrics - alot of time is spent inserting into the timeseries_meta table:

Fixed

Details
Assignee
Patrick Schweizer
Reporter
Jesse White
Labels
timeseries
Sprint
None
Fix versions
26.1.3
Priority
Minor

PagerDuty

Created May 22, 2020 at 2:44 PM

Updated July 15, 2020 at 1:45 PM

Resolved July 15, 2020 at 1:45 PM

Optimize Performance of Timeseries Integration Layer

Description

Acceptance / Success Criteria

Attachments

Subtasks

Linked issues

related to

Lucidchart Diagrams

Activity

Patrick Schweizer June 5, 2020 at 6:22 PM

Patrick Schweizer June 2, 2020 at 7:54 PMEdited

Patrick Schweizer June 2, 2020 at 2:40 PM

Patrick Schweizer June 1, 2020 at 8:30 PM

Jesse White May 22, 2020 at 7:30 PM

DetailsAssigneePatrick SchweizerPatrick SchweizerReporterJesse WhiteJesse WhiteLabelstimeseriesSprintNone+4Fix versions26.1.3PriorityMinor

Details

Assignee

Reporter

Labels

Sprint

Fix versions

Priority

PagerDutyPagerDuty Incident

PagerDuty

Patrick Schweizer June 2, 2020 at 7:54 PM
Edited

Details
Assignee
Patrick Schweizer
Reporter
Jesse White
Labels
timeseries
Sprint
None
Fix versions
26.1.3
Priority
Minor

PagerDuty