Performance of time series integration layer
Description
Acceptance / Success Criteria
Attachments
Lucidchart Diagrams
Activity

Patrick Schweizer May 18, 2022 at 6:25 PM
Overall conclusion: for the same amount of events in the ring buffer ts uses ~4x more memory.
=> closing this ticket

Freddy Chu May 17, 2022 at 9:17 PM
The last commit is with your fix.
I have also tried once without
event.setSamples(null);
The result looks similar to yours.
ring_buffer_size=131072

Freddy Chu May 17, 2022 at 2:07 AM
I am using the same config as yours. For buffer size 131072 & 262144.
It seems the result is quite difference than yours. There is one major difference is my cloud plugin is connected to a real cortex.
I am using fight recorder for the profiling. opennms Xmx 2g. Each of the test timed for 5mins only.
using karaf command: stress-metrics -n 600 -i 20 -t 8
It seems cortex even consume less memory and the throughput are similar.
At last I have also attached the jfr files for more details.
cortex 131072
– Meters ----------------------------------------------------------------------
numeric-attributes-generated
count = 4050500
mean rate = 14998.91 events/second
1-minute rate = 14998.79 events/second
5-minute rate = 14958.66 events/second
15-minute rate = 14925.51 events/second
string-attributes-generated
count = 810100
mean rate = 2999.76 events/second
1-minute rate = 2999.76 events/second
5-minute rate = 2991.73 events/second
15-minute rate = 2985.10 events/second
– Timers ----------------------------------------------------------------------
batches
count = 13
mean rate = 0.05 calls/second
1-minute rate = 0.05 calls/second
5-minute rate = 0.03 calls/second
15-minute rate = 0.01 calls/second
min = 19978.07 milliseconds
max = 20011.63 milliseconds
mean = 19998.27 milliseconds
stddev = 3.65 milliseconds
median = 19999.75 milliseconds
75% <= 20000.30 milliseconds
95% <= 20002.27 milliseconds
98% <= 20002.40 milliseconds
99% <= 20011.63 milliseconds
99.9% <= 20011.63 milliseconds
newts 131072
– Meters ----------------------------------------------------------------------
numeric-attributes-generated
count = 4050500
mean rate = 14997.93 events/second
1-minute rate = 14998.79 events/second
5-minute rate = 14958.66 events/second
15-minute rate = 14925.51 events/second
string-attributes-generated
count = 810100
mean rate = 2999.51 events/second
1-minute rate = 2999.76 events/second
5-minute rate = 2991.73 events/second
15-minute rate = 2985.10 events/second
– Timers ----------------------------------------------------------------------
batches
count = 13
mean rate = 0.05 calls/second
1-minute rate = 0.05 calls/second
5-minute rate = 0.03 calls/second
15-minute rate = 0.01 calls/second
min = 19971.69 milliseconds
max = 20003.18 milliseconds
mean = 19999.39 milliseconds
stddev = 2.89 milliseconds
median = 19999.50 milliseconds
75% <= 19999.77 milliseconds
95% <= 20003.18 milliseconds
98% <= 20003.18 milliseconds
99% <= 20003.18 milliseconds
99.9% <= 20003.18 milliseconds
cortex 262144
– Meters ----------------------------------------------------------------------
numeric-attributes-generated
count = 4050150
mean rate = 14997.42 events/second
1-minute rate = 14998.83 events/second
5-minute rate = 14958.63 events/second
15-minute rate = 14925.48 events/second
string-attributes-generated
count = 810050
mean rate = 2999.53 events/second
1-minute rate = 2999.76 events/second
5-minute rate = 2991.73 events/second
15-minute rate = 2985.10 events/second
– Timers ----------------------------------------------------------------------
batches
count = 13
mean rate = 0.05 calls/second
1-minute rate = 0.05 calls/second
5-minute rate = 0.03 calls/second
15-minute rate = 0.01 calls/second
min = 19965.51 milliseconds
max = 20039.18 milliseconds
mean = 19998.13 milliseconds
stddev = 25.01 milliseconds
median = 20001.46 milliseconds
75% <= 20002.18 milliseconds
95% <= 20039.18 milliseconds
98% <= 20039.18 milliseconds
99% <= 20039.18 milliseconds
99.9% <= 20039.18 milliseconds
newts 262144
– Meters ----------------------------------------------------------------------
numeric-attributes-generated
count = 4050500
mean rate = 14998.59 events/second
1-minute rate = 14998.79 events/second
5-minute rate = 14958.66 events/second
15-minute rate = 14925.51 events/second
string-attributes-generated
count = 810100
mean rate = 2999.70 events/second
1-minute rate = 2999.76 events/second
5-minute rate = 2991.73 events/second
15-minute rate = 2985.10 events/second
– Timers ----------------------------------------------------------------------
batches
count = 13
mean rate = 0.05 calls/second
1-minute rate = 0.05 calls/second
5-minute rate = 0.03 calls/second
15-minute rate = 0.01 calls/second
min = 19970.43 milliseconds
max = 20003.29 milliseconds
mean = 19999.57 milliseconds
stddev = 2.95 milliseconds
median = 19999.36 milliseconds
75% <= 20000.45 milliseconds
95% <= 20003.29 milliseconds
98% <= 20003.29 milliseconds
99% <= 20003.29 milliseconds
99.9% <= 20003.29 milliseconds

Patrick Schweizer May 14, 2022 at 10:37 AMEdited
To find out the difference in heap usage between newts and ts we ran another test with the following parameters:
ring_buffer_size=16
ring_buffer_size=32768
ring_buffer_size=65536
ring_buffer_size=131072
ring_buffer_size=262144
ring_buffer_size=524288
ring_buffer_size=1048576
=> TS crashes with ring buffer 4x less entries than Newts
=> ring buffer memory footprint of TS is~4x of Newts

Patrick Schweizer May 10, 2022 at 12:06 AMEdited
Trying to assess the heap usage between Newts and TSS. I did 2 load scenarios. One with Newts and one with time series integration layer (ts). The goal was to get a sense how much more ts uses the heap for the same amount of throughput.
I ran the following scenario:
stress command with 15k/s: stress-metrics -n 600 -i 20 -t 8
ring buffer size was default: ~8k
I rolled back the setting to null code change => all events in the ring buffer should be full after a short amount of time
After running for >10 min I took heap dumps and analyzed them.
Ring Buffer:
=> It looks like the TS memory footprint is ~10x
However what doesn't add up to me is the amount of SampleBatchEvents. It is ~8k for Newts but ~40k for TS. I would expect the same number (since they are recycled):
I tried to verify this number by looking at ring buffer itself in the debugger:
But this seems to look good.
We also need to be careful, the number don't really add up (I think due to multiple references):
Looking at the footprint of a SampleBatchEvent:
It looks like the footprint of Newts sample vs. Ts sample is roughly 4x.
Looking at the overall heap:
did performance tests where he saw that the ts layer doesn't perform as good as Newts. It should show a similar performance.
Test setup:
stress command:
Configuration of OpenNMS:
set debug level in log4j.xml to error
adjust the following settings in: etc/opennms.properties.d/ts.properties: