Episode 12: Optimising the Effects Data Structure

In the previous episode, I explored ways to speed up the search for the best possible gear arrangements by pruning the gear dataset. We saw that with simple filters, we can reduce the runtime from billions of years to just a few months.

Today, I’d like to explore another avenue for improvement: speeding up build effect calculations. When we combine gears together, we need to add up their bonuses. For example, if a sword gives +10 Vitality and +5 Strength, and a necklace gives +5 Vitality, then together they give +15 Vitality and +5 Strength.

Calculating these combined bonuses is required to evaluate each build, and it turns out to be the most computationally intensive part of the tool. That makes it a perfect candidate for optimisation.

Vec-based implementation

I like to start with the simplest possible implementation so that we can use it as a baseline for comparison. In this case, the most straightforward representation of gear effects is a Vec:

struct EffectsVec {
    values: Vec<Effect>,
}

struct Effect {
    kind: CharacteristicType,
    value: i32,
}

We mainly need two public operations:

get: retrieve the value for a given characteristic (e.g. effects.get(Vitality))
add: combine two Effects together

impl EffectsVec {
    fn get(&self, characteristic_type: &CharacteristicType) -> i32 {
        self.values
            .iter()
            .find(|e| e.kind == *characteristic_type)
            .map(|e| e.value)
            .unwrap_or(0)
    }

    fn add(&mut self, other: &Self) {
        for effect in &other.values {
            if let Some(existing) =
                self.values.iter_mut().find(|e| e.kind == effect.kind)
            {
                existing.value += effect.value;
            } else {
                self.values.push(effect.clone());
            }
        }
    }
}

The add method iterates over all the effects in other. If the same CharacteristicType already exists, it adds the values together; otherwise, it appends a new Effect to the vector.

Benchmarking

To compare different implementations reliably, we need a good benchmarking tool. I was recommended Criterion, which can be added by updating Cargo.toml:

[dev-dependencies]
criterion = "0.5"

Then I defined the benchmark in benches/effect_bench.rs:

use criterion::{criterion_group, criterion_main, Criterion};
use dofus_opti_dofus_build::model::EffectsVec;

fn bench_effects_addition(c: &mut Criterion) {
    let effects_vec1 = random_effects_vec();
    let effects_vec2 = random_effects_vec();

    c.bench_function("EffectsVec add", |b| {
        b.iter(|| {
            let mut sum = EffectsVec::empty();
            sum.add(&effects_vec1);
            sum.add(&effects_vec2);
            sum
        })
    });
}

criterion_group!(benches, bench_effects_addition);
criterion_main!(benches);

I also defined a random_effects_vec helper that generates an EffectsVec with 5–15 random characteristics. This makes runs slightly noisier, but I prefer it over hard-coded values that the compiler might optimise too aggressively.

If you’re a benchmarking wizard, I’d love to hear your thoughts!

Running the benchmark produces something like:

cargo bench -p dofus-opti-build

EffectsVec add   time: [97.944 ns 98.227 ns 98.529 ns]
               change: [-1.6919% -1.1894% -0.7274%]

Criterion tells us that EffectsVec::add takes roughly 98 nanoseconds. The absolute value varies a bit between runs (likely due to randomness and running on a laptop), but what matters most is the relative comparison with other implementations.

Smart merge with ordering

The current add implementation is not very efficient: for each element in second vector, it scans the entire first vector. This leads to an O(n × m) algorithm.

However, if we can guarantee that both vectors are ordered by CharacteristicType, we can merge them in a single pass — similar to the merge step in merge sort.

First, we need to make CharacteristicType orderable using the derive macro:

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub enum CharacteristicType {
    AbilityPoint,
    AbilityPointParry,
    ...
}

Then we can implement a merge-based add_ordered method:

fn add_ordered(&mut self, other: &Self) {
    let mut i = 0;
    let mut j = 0;

    let mut result = Vec::with_capacity(ALL_CHARACTERISTIC_TYPES.len());

    while i < self.values.len() && j < other.values.len() {
        let left = &self.values[i];
        let right = &other.values[j];

        if left.kind == right.kind {
            result.push(Effect {
                kind: left.kind,
                value: left.value + right.value,
            });
            i += 1;
            j += 1;
        } else if left.kind < right.kind {
            result.push(left.clone());
            i += 1;
        } else {
            result.push(right.clone());
            j += 1;
        }
    }

    result.extend_from_slice(&self.values[i..]);
    result.extend(other.values[j..].iter().cloned());

    self.values = result;
}

This approach is a bit more involved but only requires a single pass over both vectors. I also pre-allocate the result vector to avoid resizing during the merge.

Let’s run the benchmark with the two implementation:

Effects add/vec unordered time: [121.55 ns 121.92 ns 122.31 ns]
Effects add/vec ordered   time: [71.408 ns 71.690 ns 71.983 ns]

The ordered implementation is about 40% faster than the unordered one. A nice improvement!

Array-based implementation

Since the number of CharacteristicType values is fixed (around 50), we can go even further and use a fixed-size array. Each index corresponds to a specific characteristic.

struct EffectsArray {
    values: [i32; CharacteristicType::COUNT],
}

This gives us:

O(1) access for get
A simple linear loop for add

impl EffectsArray {
    fn empty() -> Self {
        Self {
            values: [0; CharacteristicType::COUNT],
        }
    }

    fn get(&self, characteristic_type: &CharacteristicType) -> i32 {
        self.values[characteristic_type.index()]
    }

    fn add(&mut self, other: &EffectsArray) {
        for i in 0..ALL_CHARACTERISTIC_TYPES.len() {
            self.values[i] += other.values[i];
        }
    }
}

I only needed to define COUNT and index on CharacteristicType:

impl CharacteristicType {
    const COUNT: usize = CharacteristicType::Wisdom.index() + 1;

    const fn index(&self) -> usize {
        match self {
            CharacteristicType::AbilityPoint => 0,
            CharacteristicType::AbilityPointParry => 1,
            ...
        }
    }
}

Let’s rerun the benchmarks:

Effects add/vec unordered time: [93.980 ns 94.377 ns 94.838 ns]
Effects add/vec ordered   time: [69.211 ns 69.651 ns 70.071 ns]
Effects add/array         time: [13.403 ns 13.414 ns 13.428 ns]

The array-based implementation is 5–7× faster than the vector-based ones — and the logic is actually simpler. Win-win!

Struct-based implementation

Out of curiosity, I also tried two struct-based implementations:

struct EffectsStruct {
    ability_point: i32,
    ability_point_parry: i32,
    ...
}

struct EffectsStructOpt {
    ability_point: Option<i32>,
    ability_point_parry: Option<i32>,
    ...
}

As you might expect, the add implementation is extremely verbose. Still, the benchmark results are interesting:

Effects add/array         time: [14.276 ns 14.385 ns 14.502 ns]
Effects add/struct        time: [14.237 ns 14.337 ns 14.458 ns]
Effects add/struct option time: [39.886 ns 40.706 ns 41.621 ns]

A struct of raw i32s performs about as well as the array. Using Option<i32> is 2–3× slower, but still faster than the vector-based approaches.

Conclusion

Today we explored several implementations of the Effects data structure and compared different algorithms to combine them. While the speed-up isn’t as dramatic as pruning the search space, choosing the right representation still yields a 5–6× performance improvement.

I’ll likely go with the array-based implementation: it’s fast, simple, and predictable. Importantly, I won’t expose its internals so I can swap it out later if needed.

All implementations and benchmark code are available here. If you have ideas for further optimisations or better benchmarking practices, I’d love to hear them!