Investigate and implement performance improvements for the ALEC DBScan engine.
Currently memory usage and tick time scale linearly with the total number of vertices in the graph and as graphs get very large memory pressure may occur based on the configured heap size and tick time will become very long. Improving the performance of this engine will allow us to process larger graphs quicker and with a smaller memory footprint.
Based on what I've seen, a significant portion of the engines time is spent calculating the spatial distance between two alarmed vertices using Dijkstra's algorithm.
One method that looks promising to increase performance is to avoid using the entire graph as input to Dijkstra's algorithm when we are calculating the distance between two alarmed vertices. Instead we can prune the graph for vertices we know are dead which will reduce the total set of vertices that are used in the shortest path calculations. This will effect both the time and memory usage of the shortest path algorithm linearly based on the number of vertices pruned from the graph. If we can prune 50% of the vertices our tick will process twice as fast.
We could also look for opportunities to parallelize the engine to save on time, I'm not sure the parallelization factor currently but if the DBScan algorithm is only asking for distance calculations serially then I suspect that could be improved.
Another area that may be worth looking into is using a different implementation of Dijkstra's algorithm or writing our own, there may or may not be performance gains to be had there.
Some performance gains in the DBScan engine may translate to better performance in other engines as well. For instance if the deep learning engine is also using Dijkstra's algorithm then it will benefit from the improvements we make there.