Performance

Caveat: Factors specified on this page are obtained from micro-benchmarks performed on specific primitive functions; in real applications factors will depend on a mix of primitives.

All benchmark tests were performed on 64-bit interpreters on Linux/Microsoft Windows operating systems.

Internal Benchmarks

Internal benchmarking was performed on the initial release of Dyalog version 16.0 and the results compared with the initial release of Dyalog version 15.0.

The benchmarking process comprises over 13,000 benchmarks in more than 130 groups; the group geometric mean timing ratios are measured and plotted against the groups sorted by their means. The vertical axis of the graph shows the ratios as a percentage change; negative values are shown in blue and indicate a performance enhancement, and positive values are shown in red and indicate a deterioration in performance.

Results showed that core interpreter performance in Dyalog version 16.0 has an average improvement of 6% over Dyalog version 15.0.

Specific Speed-Ups in Dyalog Version 16.0

The following table lists speed-ups to specific primitive functions made in Dyalog version 16.0. The improvement factors given in this table are usually obtained on large arguments (thousands or millions of items) measured by cmpx on version 16.0 compared with version 15.0.

ExpressionImprovement FactorNotes
Transpose (monadic ) 5-20 for Boolean arrays
Reshape (dyadic ) unlimited* for arrays that are not shared and only when there is no rank increase
≈1.5 when the left argument is
Catenate (dyadic ,) unlimited* when appending a few elements to a large array that is not shared
up to 5 when laminating a non-Boolean array along the last axis
5-10 when laminating a Boolean array along the last axis and the last axis length of the resultant array is less than 64
Catenate First (dyadic ) unlimited* when appending a few elements to a large array that is not shared
Take (dyadic ) 2-10 when performing an overtake on the last axis of a Boolean array such that the last axis length of the resultant array is less than 64
Enlist (monadic ) ≈2 for any nested array comprising small simple arrays (not mixed type)
Unique (monadic )
Membership (dyadic )
Find (dyadic )
Without (dyadic ~)
Union (dyadic )
Intersection (dyadic )
up to 2 for any nested array comprising small simple arrays (not mixed type)
Expand (dyadic \)
Expand First (dyadic )
up to 20 when the left argument is a Boolean array
up to 5 when the left argument is a non-Boolean array
Encode (dyadic ) up to 4 when converting to base-2
up to 6 general case
Decode (dyadic ) up to 4 when converting from base-2
up to 2.5 general case
Index of (dyadic ) 2-18 when left and right arguments are different numeric data types
dyadic ⊣¨ and ⊢¨ 500-1000 general case
unlimited* when right argument has a single element, making them equivalent toand
monadic ⊂¨ and ⊃¨ unlimited* for simple array rights arguments (no-ops)
Rotate (dyadic ) up to 12 operations most improved are:
  • rotate an array of large types about the first axis
  • rotate an array of small types (that is, short integers/characters) about the last axis
Replicate (dyadic /)
Replicate First (dyadic )
4-5 when the left argument is a Boolean array and the processors support the BMI2 instruction set (for Intel this starts with Haswell in 2013)
up to 5 when the left argument is a non-Boolean array
monadic =\ and ≠\ 4-7 for a Boolean array that has a last axis length greater than 32
Reverse (monadic ) up to 2 for arrays that are not shared
up to 4 for a Boolean array that has a last axis length less than 64

* speed-ups depend on the size of the arguments (increases as argument size increases)

In addition:

  • ⍉⍤2 now uses the same algorithm as transpose with a rank two argument, so it's at least as fast to batch matrices then transpose them as to transpose them individually.
  • Transposing row and column vectors is now much faster.
  • Operations that copy Boolean data between arrays, for example, reshape and catenate, are up to 6 times faster.
  • The :For control structure is up to 1.5 times faster.
  • On the Microsoft Windows operating system, Execute (monadic ) is approximately 25 times faster on very long strings containing lots of numbers. This means that the ]IN user command is much faster at loading workspace files containing lots of numeric data.