{rslt} ← {larg} (fun ##.ticks) rarg ⍝ Sample Dfn execution clock ticks.
During code execution, ⎕AI's "compute time" clock ticks every so-many millisec-
onds. Operator [ticks] records which line in its operand [fun]ction is executing
as each tick occurs and reports the percentage of ticks that fall on each line.
In order to obtain a significant tick-sample, the function is applied repeatedly
to (or between) its argument(s) until the clock has ticked 1,000 times. Repeated
application means that any side effects (⎕nappend, etc) will also be perpetrated
repeatedly, which may be undesirable.
At the end of the time period, the function source is displayed in the session
together with, for each line:
- The number of times the line was "visited".
- The percentage of clock ticks recorded for the line.
┌────────────── visits
│ ┌─────────── percent
│ │ ┌──────── source
│ │ │
5 14 sieve←{ ⍝ Sieve of Eratosthenes.
5 15 ⍺←⍬ ⍝ Default no primes yet.
5 14 nxt←1↑⍵ ⍝ Next prime, and
5 14 msk←×nxt|⍵ ⍝ ... mask of non-multiples.
5 17 ^/1↓msk:⍺,⍵ ⍝ All non multiples - finished.
4 26 (⍺,nxt)∇ msk/⍵ ⍝ Sieve remainder.
0 0 }
Where:
- Visits: This is (intended to be) an accurate account of the number visits to
each line. A line containing a local function is deemed to be visited both at
definition and application time. This behaviour differs from that of operator
→mdf←, which reports only the definition count in such cases. See Bugs[6]
below.
- Percent: Approximate indication of the proportion of clock ticks occuring in
each line. Owing to timing granularity, the column total may not be 100; see
below.
- Source: As much of the subject function's source code as will fit across the
screen.
Unlike ⎕MONITOR used with traditional functions, the time attributed to a line
_excludes_ that spent in called subfunctions. This is a _good_thing_.
⎕AI's clock resolution varies: on some systems, the clock ticks every 10 milli-
seconds and on others, every 16 (or so). In any case, many lines of code may be
executed between ticks. The best we can do is to run the function for a period
of time and notice on which lines the ticks occur.
The following diagram illustrates the problem: each ┼────┼ represents the real
and varying execution time of successive lines of code, and ↑tick shows the reg-
ular clock ticks occuring.
─┼───┼──┼─┼────┼────┼───┼─┼──┼─┼─┼────────┼──────────┼──────┼─────┼─┼───┼
↑ ↑ ↑ ↑
tick tick tick tick
To interpret the result as an indication of relative processing time for each
line, we must suppose that, in general, lines that take longer to execute will
sustain proportionally more ticks.
Bugs:
[1] The time taken to execute the monitoring code often vastly exceeds the time
taken to execute the code it is monitoring. This would be OK if the monitor
took _exactly_ the same time to execute on each occasion, in which case, we
could calculate and subtract the overhead. However, in Dyalog APL, the heap
manager may take a different number of cycles to satisfy a request depending
on the state of fragmentation of the heap. This means that attempts to
account for the execution time overhead of the monitoring code can be only
approximate.
[2] If the (copy of the) subject function, with its embedded monitoring code,
happens to run in a time that is an exact multiple of the system clock's
resolution, all bets are off as ticks would tend to accumulate in the same
few lines.
[3] The generated monitoring function (⍙_f) will fail with a SYNTAX ERROR if any
local function call does not produce a result. For example:
loop←{
0<⍵:∇ ⍵-1
}
[4] During execution of the subject function, tail-calling is disabled. This
means that a "highly iterative" function such as the following will produce
misleading results:
loop←{
⍵=0:⍵ ──── comparision with 0 _should_ be
∇ ⍵-1 ──── commensurate with subtraction of 1.
}
loop ticks 1e3
1001 15 loop←{
1001 14 ⍵=0:⍵
1000 71 ∇ ⍵-1
0 0 }
[5] As the monitoring code is merged into a copy of the subject function, there
is potential for a name clash. The presence of any of the following names in
the subject function will lead to unpredictable results:
⍙_a ⍙_e ⍙_f ⍙_l
⍙_m ⍙_n ⍙_q ⍙_s
⍙_t ⍙_x ⍙_z
⍙_ _⍙
[6] The line visits counter is incremented whenever the entry monitor code de-
tects that the line number (⎕LC) has changed. Such is the case when a local
function is invoked from a line other than its definition line. Compare the
following examples:
1 ·· dup1←{
3 ·· dup←{⍵ ⍵} ──── 3 visits: 1 definition +
1 ·· a←dup ⍵ ──── 1 call +
1 ·· dup a ──── 1 more call.
0 ·· }
1 ·· dup2←{
1 ·· dup←{⍵ ⍵} ⋄ a←dup ⍵ ⋄ dup a ─── 1 visit to line.
0 ·· }
1 ·· dup3←{
2 ·· dup←{⍵ ⍵} ──── 2 visits: 1 definition +
1 ·· dup dup ⍵ ──── 1 (double) call.
0 ·· }
1 ·· dup4←{
1 ·· {⍵ ⍵}{⍵ ⍵}⍵ ──── 1 visit to line.
0 ·· }
1 ·· dup5←{
2 ·· { ──── 2 visits: 1 definition +
1 ·· ⍵ ⍵ ──── 1 call.
1 ·· }{
1 ·· ⍵ ⍵
0 ·· }⍵
0 ·· }
1 ·· dup6←{
1 ·· {⍵ ⍵}¨⍵ ⍵ ──── 1 visit to line.
0 ·· }
[7] When the execution path through the subject function is determined by fact-
ors other than its arguments, the first (visits) column of the result may
contain non-integers. This might be the case, for example, if ⎕TS or ⎕RL
were used within the function, as with:
maze ticks 10 20 ⍝ reports non-integral line visits count.
Such a result might be interpreted as an _average_ number of line visits.
Technical notes:
The coding of [ticks] is fairly ghastly for the same reason as →mdf←. See the
technical notes in →mdf← for details.
Monitoring is achieved in several stages:
- A copy of the subject function is injected with entry (→⍵) and exit (⍺←) mark-
ers at the start of each line and at the entry and exit points of each sub-
function. These markers are subsequently replaced with calls to the entry and
exit monitoring sequences:
⍙_ ⍵: ⋄ ··· ⍝ entry sequence.
⍺ _⍙ ··· ⍝ exit sequence,
where ⍺ and ⍵ are replaced with the number of the function line in which the
sequence appears. Monitoring function ⍙_ returns 0 (so that its guard is ign-
ored) and _⍙ returns its right argument as the result of the (sub-) function.
- A small loop is executed to determine ⎕AI's clock resolution and therefore the
time needed to endure 1,000 clock ticks.
- The doctored function is executed repeatedly for this time period, during
which sampling information is accumulated in the operator's local variables:
⍙_t time accumulated per line.
⍙_n number of times line is visited (first column of resulting display).
⍙_e number of times entry monitor is called.
⍙_x .. .. exit .. ..
The copy (⍙_f) of the subject function is applied under ⍎ in order to defy
Dfns' "strictly local to the capsule" rule that would otherwise put the accum-
ulating variables beyond its scope.
- An attempt is made to calculate the time overhead spent within the monitoring
code itself. This is effected by timing loops with embedded entry and exit
sequences. Note that these loops are "compiled" by using ⍎, so that they bene-
fit from the same interpreter performance improvements, if any (such as idiom
recognition), as the monitored subject function.
- Finally, this approximate monitoring overhead is subtracted from the monitored
times and the result displayed using ⎕←. The result of the subject function
application is returned as a shy result, so that [ticks] may easily be embedd-
ed within any expression; see the tube.dws example below.
Examples:
⍝ Ackermann's function is a challenging case, as the processing time is
⍝ _negligible_ compared with the monitoring overhead. Repeated monitoring
⍝ of this function often shows quite disparate timing.
3 ack ticks 3
2432 23 ack←{
2432 22 ⍺=0:⍵+1
1244 14 ⍵=0:(⍺-1)∇ 1
1187 41 (⍺-1)∇ ⍺ ∇ ⍵-1
0 0 }
3 ack ticks 3
2432 22 ack←{
2432 24 ⍺=0:⍵+1
1244 16 ⍵=0:(⍺-1)∇ 1
1187 38 (⍺-1)∇ ⍺ ∇ ⍵-1
0 0 }
⍝ However, for more substatial functions, the display can be illuminating. Here
⍝ is the output from a graph searching function in tube.dws. Inserting a call on
⍝ ticks into the trip function, shows that around 80% of its time is spent in
⍝ determining which vertices are free.
⍝
⍝ trip[19] find←⍺.graph{⍺⍺ path ticks ⍺ ⍵} ⍝ find one leg of the route.
⍝ ¯¯¯¯¯
london trip 'Hammersmith' 'Mornington'
1 0 path←{ ⍝ Shortest path from/to ⍵ in graph ⍺.
1 0 graph←⍺ ⍝ ⍺ is graph vector.
1 0 fm to←⍵ ⍝ starting and target vertex(ices).
1 1 tree←¯2+(⍳⍴⍺)∊fm ⍝ initial spanning tree.
573 80 free←{(¯2=tree[⍵])/⍵} ⍝ free vertices in ⍵.
20 1 ⍬{ ⍝ path accumulator.
19 1 ⍵<0:(⍵=¯2)↓⍺ ⍝ root or unvisited vertex: finished.
18 3 (⍵,⍺)∇ ⍵⊃tree ⍝ otherwise: prefix previous (parent) vertex
18 1 }{ ⍝ find partial spanning tree:
18 1 ⍵≡⍬:¯1 ⍝ no vertices left: stitched.
18 1 1∊to∊⍵:1↑⍵∩to ⍝ found target: finished.
17 1 next←free¨graph[⍵] ⍝ next vertices to visit.
17 4 back←↑,/⍵+0×next ⍝ back links.
17 1 wave←↑,/next ⍝ vertex wave front.
17 1 tree[wave]←back ⍝ set back links in tree.
17 3 ∇∪wave ⍝ advance wave front.
0 0 }fm ⍝ from starting vertex.
0 0 }
Hammersmith
Hammersmith Piccadilly
Barons Court Piccadilly
Earl's Court Piccadilly
Gloucester Road Piccadilly
South Kensington Piccadilly
Knightsbridge Piccadilly
Hyde Park Corner Piccadilly
Green Park Piccadilly
Green Park
Green Park Victoria
Oxford Circus Victoria
Warren Street Victoria
Euston Victoria
Euston
Euston Northern
Mornington Crescent Northern
Mornington Crescent
⍝ This example uses Hungarian Assignment on a 100×100 matrix. Note that just
⍝ four of the lines account for around 60% of processing time:
assign ticks ?100 100⍴1000
1 0 assign←{⎕ML←0 ⍝ Hungarian method cost assi
1 0
2 0 step0←{step1(⌽⌈\⌽⍴⍵)↑⍵} ⍝ 0: at least as many rows a
1 0
2 0 step1←{step2↑(↓⍵)-⌊/⍵} ⍝ 1: reduce rows by minimum
1 0
2 0 step2←{ ⍝ 2: mark independent zeros.
4 0 stars←{ ⍝ independent zeros.
3 0 ~1∊⍵:⍺ ⍝ no more zeros: done.
2 0 next←<\<⍀⍵ ⍝ more independent zeros.
2 0 mask←(rows next)∨cols next ⍝ mask of dependent rows and
2 0 (⍺∨next)∇ ⍵>mask ⍝ ⍺-accumulated star matrix.
0 0 } ⍝
2 0 zeros←{⍵+0 stars ⍵}⍵=0 ⍝ 1=>zero, 2=>independent ze
1 0 step3 ⍵ zeros ⍝ next step: 3.
0 0 }
1 0
41 0 step3←{costs zeros←⍵ ⍝ 3: cover cols with starred
40 0 stars←zeros=2 ⍝ starred zeros.
40 0 covers←2×cols stars ⍝ covered cols.
40 0 ~0∊covers:stars ⍝ all cols covered: solution
39 0 step4 costs zeros covers ⍝ next step: 4.
0 0 }
1 0
828 1 step4←{costs zeros covers←⍵ ⍝ 4: adjust covering lines.
827 3 mask←covers=0 ⍝ mask of uncovered elements
827 5 open←1=mask×zeros ⍝ uncovered zeros.
827 5 ~1∊open:(⌊/(,mask)/,costs)step6 ⍵ ⍝ no uncovered zeros, next s
767 1 prime←first open ⍝ choose first uncovered zer
767 1 prow←rows prime ⍝ row containing prime.
767 6 star←2=zeros×prow ⍝ star in row containing pri
806 6 ~1∊star:prime step5{ ⍝ no star in row, next step
39 0 costs ⍵ prime ⍝ adjusted zeros matrix,
0 0 }zeros+2×prime ⍝ new primed zero (3).
728 1 cnext←covers+prow-2×cols star ⍝ adjusted covers.
728 6 znext←zeros⌈3×prime ⍝ primed zero.
728 4 ∇ costs znext cnext ⍝ adjusted zeros and covers
0 0 }
1 0
109 0 step5←{costs zeros prime←⍵ ⍝ 5: exchange starred zeros.
108 1 star←(cols prime)^zeros=2 ⍝ next star.
147 0 ~1∊star:step3 ⍺{ ⍝ no stars: next step :3.
117 1 {costs ⍵}{⍵-2×⍵=3}⍵-⍺^⍵>1 ⍝ unstarred stars; starred p
0 0 }zeros ⍝ adjusted zero markers.
69 0 pnext←(rows star)^zeros=3 ⍝ next prime.
69 0 (⍺∨pnext∨star)∇ costs zeros pnext ⍝ ⍺-accumulated prime-star-·
0 0 }
1 0
61 0 step6←{costs zeros covers←⍵ ⍝ 6: adjust cost matrix.
60 20 cnext←costs+⍺ׯ1 1+.×0 3∘.=covers ⍝ add and subtract minimum v
60 1 znext←zeros+(×costs)-×cnext ⍝ amended zeros marker.
60 0 step4 cnext znext covers ⍝ next step: 4.
0 0 }
1 0
839 15 rows←{(⍴⍵)⍴(⊃⌽⍴⍵)/∨/⍵} ⍝ row propagation.
879 12 cols←{(⍴⍵)⍴∨⌿⍵} ⍝ column propagation.
768 12 first←{(⍴⍵)⍴<\,⍵} ⍝ first 1 in bool matrix.
1 0
1 0 (⍴⍵)↑step0 ⍵ ⍝ start with step 0.
0 0 }
See also: mdf cmpx time
Back to: contents
Back to: Dyalog APL
Trouble seeing APL font?