Optimize Performance of Timeseries Integration Layer

Description

The Timeseries Integration Layer is feature complete. However in order to use it in production it must be performant enough.

Opennms installations get around 40.000 datapoints (Metric + timestamp + value) per second.

This task is to examine the bottlenecks of the integration layer and to remove them . The current implementation is based on the Opennms Newts code and optimized for Newts. We can now take this the next step and optimize for the integration layer needs.

Acceptance / Success Criteria

None

Attachments

1
100% Done
Loading...

related to

Lucidchart Diagrams

Activity

Patrick Schweizer June 5, 2020 at 6:22 PM

I moved the different bottlenecks to subtasks and made this ticket the umbrella for all optimizations.

Patrick Schweizer June 2, 2020 at 7:54 PM
Edited

Talking to ,

the goal is to make sure the integration layer is as performant as possible. The current structure is optimized for Newts / Cassandra, not for a tag based storage. We want to be able to wrote 40k samples / second.

In order to stress test and profile the system we can use the following commands:
One off:
admin@opennms> collect --persist --node 2 org.opennms.netmgt.collectd.SnmpCollector 127.0.0.1
Stress:
admin@opennms> stress-metrics --interfaces 5 --strings 2 --interval 30 --nodes 10000 --threads 10 --groups 5 --attributes 5

Patrick Schweizer June 2, 2020 at 2:40 PM

I looked into the different caches we have in the TimeseriesIntegrationLayer:

  • SearchableResourceMetadataCache: used byTimeseriesResourceStorageDao. Stores

  • TimeseriesSearcher.metricsUnderResource: (new) caches all Metrics that can be found under a resource (by a wildcard search). Caches results coming from TimeseriesStorage implementation.

  • TimeSeriesMetaDataDao.cache: caches all attributes associated with a resourceId. Caches results from the database.

Patrick Schweizer June 1, 2020 at 8:30 PM

TimeseriesMetaDataDao.storeMetadata cached only on reads, not on writes. I fixed that. We should see now a lot less writes I suppose...

Jesse White May 22, 2020 at 7:30 PM

With respect to caching, it appears that there is also room for improvement in the TimeSeriesMetaDataDao. When stressing the system with metrics - alot of time is spent inserting into the timeseries_meta table:

 

Fixed

Details

Assignee

Reporter

Sprint

Fix versions

Priority

PagerDuty

Created May 22, 2020 at 2:44 PM
Updated July 15, 2020 at 1:45 PM
Resolved July 15, 2020 at 1:45 PM