The state of summation (18.0)

FloatsIntegersBooleans
Fast Fast Fast
Slow Slow (for now) Fast
Not fast Fast Fast

Floating-point reductions are hard or impossible to speed up given Dyalog's ordering requirements.

However, replacing +A with 1A in your code frees us to use faster algorithms!

Use 1(r)A to sum along non-leading axes of A.