Optimize Performance of Timeseries Integration Layer
Description
Acceptance / Success Criteria
Attachments
related to
Lucidchart Diagrams
Activity

Patrick Schweizer June 5, 2020 at 6:22 PM
I moved the different bottlenecks to subtasks and made this ticket the umbrella for all optimizations.

Patrick Schweizer June 2, 2020 at 7:54 PMEdited
Talking to ,
the goal is to make sure the integration layer is as performant as possible. The current structure is optimized for Newts / Cassandra, not for a tag based storage. We want to be able to wrote 40k samples / second.
In order to stress test and profile the system we can use the following commands:
One off:
admin@opennms> collect --persist --node 2 org.opennms.netmgt.collectd.SnmpCollector 127.0.0.1
Stress:
admin@opennms> stress-metrics --interfaces 5 --strings 2 --interval 30 --nodes 10000 --threads 10 --groups 5 --attributes 5

Patrick Schweizer June 2, 2020 at 2:40 PM
I looked into the different caches we have in the TimeseriesIntegrationLayer:
SearchableResourceMetadataCache: used byTimeseriesResourceStorageDao. Stores
TimeseriesSearcher.metricsUnderResource: (new) caches all Metrics that can be found under a resource (by a wildcard search). Caches results coming from TimeseriesStorage implementation.
TimeSeriesMetaDataDao.cache: caches all attributes associated with a resourceId. Caches results from the database.

Patrick Schweizer June 1, 2020 at 8:30 PM
TimeseriesMetaDataDao.storeMetadata cached only on reads, not on writes. I fixed that. We should see now a lot less writes I suppose...

Jesse White May 22, 2020 at 7:30 PM
With respect to caching, it appears that there is also room for improvement in the TimeSeriesMetaDataDao
. When stressing the system with metrics - alot of time is spent inserting into the timeseries_meta
table:
Details
Assignee
Patrick SchweizerPatrick SchweizerReporter
Jesse WhiteJesse WhiteLabels
Sprint
NoneFix versions
Priority
Minor
Details
Details
Assignee

Reporter

Labels
Sprint
Fix versions
Priority
PagerDuty
PagerDuty Incident
PagerDuty
PagerDuty Incident
PagerDuty

The Timeseries Integration Layer is feature complete. However in order to use it in production it must be performant enough.
Opennms installations get around 40.000 datapoints (Metric + timestamp + value) per second.
This task is to examine the bottlenecks of the integration layer and to remove them . The current implementation is based on the Opennms Newts code and optimized for Newts. We can now take this the next step and optimize for the integration layer needs.