What about booleans?

Our short-stride strategy extends rows until they are at least 256 bytes.

For booleans, we reduce this limit to 64 bytes and also combine rows until virtual rows are a multiple of 8 bits.

Matrices of random booleans containing at least 4e5 elements. Run on Intel Kaby Lake i7-7500U.