When calculating the "Other" part that is not covered by the topK hosts the total volume that is used to determine that other part has to be doubled.
Assume the following flows reported for an exporter/interface:
The total volume for that exporter/interface is:
When deriving host aggregations the observed traffic is assigned to both, the src and dst hosts. In case of the example flows the derived host aggregations are:
When the host aggregations are summed up the total is:
In case of a top-1 aggregation the result including the "Other" part is:
|Other||2*(b1+b3) - (b1+b3) = b1+b3||2*(b2+b4) - (b2 + b4) = b2+b4|
In case of a top-2 aggregation (assuming b1+b2 > b3+b4) the result including the "Other" part is:
|Other||2*(b1+b3) - (b1+b3) - b1 = b3||2*(b2+b4) - (b2 + b4) - b2 = b4|
There are two alternatives how traffic volumes could be assigned to hosts:
- ingress bytes and egress bytes could be divided by two and assigned to both hosts as before
- ingress bytes could be assigned to the source host and egress bytes to the destination host
The first alternative has the advantage that host volumes would sum up to the total exporter/interface volume. The disadvantage is that for each host the ingress/egress bytes now are only half of the value reported in flows.
The second alternative also has the advantage that host volumes sum up to the total exporter/interface volume. The ingress / egress numbers separate the traffic when a host was the source host and when a host was the target host. The disadvantage is that the current aggregation scheme that first calculates aggregations for conversations must be changed. Conversations do not distinguish between src and dst host. Therefore they can not be used as the base for deriving host aggregations.