Performance of time series integration layer
Description
Acceptance / Success Criteria
Attachments
Lucidchart Diagrams
Activity
Patrick Schweizer May 18, 2022 at 6:25 PM
Overall conclusion: for the same amount of events in the ring buffer ts uses ~4x more memory.
=> closing this ticket
Freddy Chu May 17, 2022 at 9:17 PM
The last commit is with your fix.
I have also tried once without
event.setSamples(null);
The result looks similar to yours.
ring_buffer_size=131072
Freddy Chu May 17, 2022 at 2:07 AM
I am using the same config as yours. For buffer size 131072 & 262144.
It seems the result is quite difference than yours. There is one major difference is my cloud plugin is connected to a real cortex.
I am using fight recorder for the profiling. opennms Xmx 2g. Each of the test timed for 5mins only.
using karaf command: stress-metrics -n 600 -i 20 -t 8
It seems cortex even consume less memory and the throughput are similar.
At last I have also attached the jfr files for more details.
cortex 131072
– Meters ----------------------------------------------------------------------
numeric-attributes-generated
count = 4050500
mean rate = 14998.91 events/second
1-minute rate = 14998.79 events/second
5-minute rate = 14958.66 events/second
15-minute rate = 14925.51 events/second
string-attributes-generated
count = 810100
mean rate = 2999.76 events/second
1-minute rate = 2999.76 events/second
5-minute rate = 2991.73 events/second
15-minute rate = 2985.10 events/second
– Timers ----------------------------------------------------------------------
batches
count = 13
mean rate = 0.05 calls/second
1-minute rate = 0.05 calls/second
5-minute rate = 0.03 calls/second
15-minute rate = 0.01 calls/second
min = 19978.07 milliseconds
max = 20011.63 milliseconds
mean = 19998.27 milliseconds
stddev = 3.65 milliseconds
median = 19999.75 milliseconds
75% <= 20000.30 milliseconds
95% <= 20002.27 milliseconds
98% <= 20002.40 milliseconds
99% <= 20011.63 milliseconds
99.9% <= 20011.63 milliseconds
newts 131072
– Meters ----------------------------------------------------------------------
numeric-attributes-generated
count = 4050500
mean rate = 14997.93 events/second
1-minute rate = 14998.79 events/second
5-minute rate = 14958.66 events/second
15-minute rate = 14925.51 events/second
string-attributes-generated
count = 810100
mean rate = 2999.51 events/second
1-minute rate = 2999.76 events/second
5-minute rate = 2991.73 events/second
15-minute rate = 2985.10 events/second
– Timers ----------------------------------------------------------------------
batches
count = 13
mean rate = 0.05 calls/second
1-minute rate = 0.05 calls/second
5-minute rate = 0.03 calls/second
15-minute rate = 0.01 calls/second
min = 19971.69 milliseconds
max = 20003.18 milliseconds
mean = 19999.39 milliseconds
stddev = 2.89 milliseconds
median = 19999.50 milliseconds
75% <= 19999.77 milliseconds
95% <= 20003.18 milliseconds
98% <= 20003.18 milliseconds
99% <= 20003.18 milliseconds
99.9% <= 20003.18 milliseconds
cortex 262144
– Meters ----------------------------------------------------------------------
numeric-attributes-generated
count = 4050150
mean rate = 14997.42 events/second
1-minute rate = 14998.83 events/second
5-minute rate = 14958.63 events/second
15-minute rate = 14925.48 events/second
string-attributes-generated
count = 810050
mean rate = 2999.53 events/second
1-minute rate = 2999.76 events/second
5-minute rate = 2991.73 events/second
15-minute rate = 2985.10 events/second
– Timers ----------------------------------------------------------------------
batches
count = 13
mean rate = 0.05 calls/second
1-minute rate = 0.05 calls/second
5-minute rate = 0.03 calls/second
15-minute rate = 0.01 calls/second
min = 19965.51 milliseconds
max = 20039.18 milliseconds
mean = 19998.13 milliseconds
stddev = 25.01 milliseconds
median = 20001.46 milliseconds
75% <= 20002.18 milliseconds
95% <= 20039.18 milliseconds
98% <= 20039.18 milliseconds
99% <= 20039.18 milliseconds
99.9% <= 20039.18 milliseconds
newts 262144
– Meters ----------------------------------------------------------------------
numeric-attributes-generated
count = 4050500
mean rate = 14998.59 events/second
1-minute rate = 14998.79 events/second
5-minute rate = 14958.66 events/second
15-minute rate = 14925.51 events/second
string-attributes-generated
count = 810100
mean rate = 2999.70 events/second
1-minute rate = 2999.76 events/second
5-minute rate = 2991.73 events/second
15-minute rate = 2985.10 events/second
– Timers ----------------------------------------------------------------------
batches
count = 13
mean rate = 0.05 calls/second
1-minute rate = 0.05 calls/second
5-minute rate = 0.03 calls/second
15-minute rate = 0.01 calls/second
min = 19970.43 milliseconds
max = 20003.29 milliseconds
mean = 19999.57 milliseconds
stddev = 2.95 milliseconds
median = 19999.36 milliseconds
75% <= 20000.45 milliseconds
95% <= 20003.29 milliseconds
98% <= 20003.29 milliseconds
99% <= 20003.29 milliseconds
99.9% <= 20003.29 milliseconds
Patrick Schweizer May 14, 2022 at 10:37 AMEdited
To find out the difference in heap usage between newts and ts we ran another test with the following parameters:
ring_buffer_size=16
ring_buffer_size=32768
ring_buffer_size=65536
ring_buffer_size=131072
ring_buffer_size=262144
ring_buffer_size=524288
ring_buffer_size=1048576
=> TS crashes with ring buffer 4x less entries than Newts
=> ring buffer memory footprint of TS is~4x of Newts
Patrick Schweizer May 10, 2022 at 12:06 AMEdited
Trying to assess the heap usage between Newts and TSS. I did 2 load scenarios. One with Newts and one with time series integration layer (ts). The goal was to get a sense how much more ts uses the heap for the same amount of throughput.
I ran the following scenario:
stress command with 15k/s: stress-metrics -n 600 -i 20 -t 8
ring buffer size was default: ~8k
I rolled back the setting to null code change => all events in the ring buffer should be full after a short amount of time
After running for >10 min I took heap dumps and analyzed them.
Ring Buffer:
=> It looks like the TS memory footprint is ~10x
However what doesn't add up to me is the amount of SampleBatchEvents. It is ~8k for Newts but ~40k for TS. I would expect the same number (since they are recycled):
I tried to verify this number by looking at ring buffer itself in the debugger:
But this seems to look good.
We also need to be careful, the number don't really add up (I think due to multiple references):
Looking at the footprint of a SampleBatchEvent:
It looks like the footprint of Newts sample vs. Ts sample is roughly 4x.
Looking at the overall heap:
Details
Assignee
Patrick SchweizerPatrick SchweizerReporter
Patrick SchweizerPatrick SchweizerFix versions
Priority
Minor
Details
Details
Assignee
Reporter
Fix versions
Priority
PagerDuty
PagerDuty Incident
PagerDuty
PagerDuty Incident
PagerDuty

did performance tests where he saw that the ts layer doesn't perform as good as Newts. It should show a similar performance.
Test setup:
stress command:
Configuration of OpenNMS:
set debug level in log4j.xml to error
adjust the following settings in: etc/opennms.properties.d/ts.properties: