|
It's slower than the previous version but fixes a bug when stride isn't
a multiple of 8.
It can be improved further by setting `auto& byte` all at once instead
of setting individual bits of multiple bytes in the innermost loop.
------------------------------------------------------------
New version Time CPU Iterations
------------------------------------------------------------
Bitmask_mean 1911 us 1893 us 5
Bitmask_median 1911 us 1885 us 5
Bitmask_stddev 2.00 us 18.7 us 5
Bitmask_cv 0.10 % 0.99 % 5
-------------------------------------------------------------
------------------------------------------------------------
Buggy version Time CPU Iterations
------------------------------------------------------------
Bitmask_mean 841 us 841 us 5
Bitmask_median 839 us 837 us 5
Bitmask_stddev 3.29 us 7.80 us 5
Bitmask_cv 0.39 % 0.93 % 5
------------------------------------------------------------
-----------------------------------------------------------
Naive version Time CPU Iterations
------------------------------------------------------------
Bitmask_mean 4006 us 3997 us 10
Bitmask_median 4006 us 3997 us 10
Bitmask_stddev 2.29 us 0.000 us 10
Bitmask_cv 0.06 % 0.00 % 10
------------------------------------------------------------
|