Reductions on vectors

The virtual row method works on any stride, including 1. But stride 1, or vector, reduction is important enough to merit some extra effort.

By setting the virtual row length equal to the length of a vector register, we can combine virtual rows with a single vector instruction.

This increases the speed at which a vector can be processed, but also drastically reduces the constant cost for each vector.