How many changes has ⎕R made?

APL-related discussions - a stream of APL consciousness.
Not sure where to start a discussion ? Here's the place to be
Forum rules
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !

How many changes has ⎕R made?

Postby Budgie on Wed Sep 07, 2016 2:47 pm

Is there a nice easy way to determine how many changes have been made by a call to ⎕R?
Jane
User avatar
Budgie
 
Posts: 36
Joined: Thu Nov 26, 2009 9:22 am
Location: Beckenham

Re: How many changes has ⎕R made?

Postby ArrayMac227 on Wed Sep 07, 2016 3:11 pm

Looking in the help file for ⎕R (Replace) I found:

('.at' ⎕S {⍵.((1↑Offsets),1↑Lengths)}) 'The cat sat on the mat'
4 3 8 3 19 3 ⍝ 3 items

Does this help?
ArrayMac227
 
Posts: 36
Joined: Sat Sep 12, 2015 1:40 pm

Re: How many changes has ⎕R made?

Postby Phil Last on Thu Sep 08, 2016 9:58 am

Never used it but -

You would have to do the search for the original substring before the replacement. Then you would know how many were about to be replaced. Looking for the frequency of the replacement substring after the change might find instances that were already in the source.

Of course others may know if further information regarding the number of changes is available during the actual call to ⎕R as apparently we can specify text and/or other data to be returned.
User avatar
Phil Last
 
Posts: 441
Joined: Thu Jun 18, 2009 6:29 pm

Re: How many changes has ⎕R made?

Postby ArrayMac227 on Thu Sep 08, 2016 1:22 pm

It is worthwhile to read documentation on ⎕R and ⎕S. It is the first system function I've seen that works with data items outside the 'usual' numeric character and enclosed domains.

≢⎕←'.at'⎕S⊢'The cat sat on the mat'
#.[⎕S match info] #.[⎕S match info] #.[⎕S match info]
3

Essentially, regular expressions are not only a new sub-language all on their own, but the Dyalog interface breaks some new ground.
ArrayMac227
 
Posts: 36
Joined: Sat Sep 12, 2015 1:40 pm

Re: How many changes has ⎕R made?

Postby Morten|Dyalog on Fri Sep 09, 2016 7:41 am

Not the first system function, the first system OPERATOR (of any description). In languages with strong support for REGEX, it appeared to us that (well, me any anyway :-)) that regular expressions were used as a control structure, invoking a block of code for each match. The closest equivalent to that in APL is an operator with a user-defined function. So rather than go for a classical ⎕SS style function, we decided to make ⎕R/⎕S the first "system operators".
User avatar
Morten|Dyalog
 
Posts: 304
Joined: Tue Sep 09, 2008 3:52 pm

Re: How many changes has ⎕R made?

Postby Budgie on Fri Sep 09, 2016 12:13 pm

ArrayMac227 wrote:It is worthwhile to read documentation on ⎕R and ⎕S. It is the first system function I've seen that works with data items outside the 'usual' numeric character and enclosed domains.

≢⎕←'.at'⎕S⊢'The cat sat on the mat'
#.[⎕S match info] #.[⎕S match info] #.[⎕S match info]
3

Essentially, regular expressions are not only a new sub-language all on their own, but the Dyalog interface breaks some new ground.


I have read the documentation, which is why I am asking here. In my application the variable being processed is a vector of (typically 20,000) character vectors. Can you guarantee that what is returned by ⎕S is exactly what will be processed by ⎕R in all circumstances?
Jane
User avatar
Budgie
 
Posts: 36
Joined: Thu Nov 26, 2009 9:22 am
Location: Beckenham

Re: How many changes has ⎕R made?

Postby ArrayMac227 on Fri Sep 09, 2016 12:41 pm

> I have read the documentation, which is why I am asking here. In my application the
> variable being processed is a vector of (typically 20,000) character vectors. Can you
> guarantee that what is returned by ⎕S is exactly what will be processed by ⎕R in all
> circumstances?[/quote]

@Budgie I'm not sure if you're asking anything beyond whether ⎕R has bugs or not? The guarantees are whatever come with the software, I presume. This is always something you can test.
ArrayMac227
 
Posts: 36
Joined: Sat Sep 12, 2015 1:40 pm

Re: How many changes has ⎕R made?

Postby Richard|Dyalog on Fri Sep 09, 2016 12:48 pm

It is not clear to me what is meant by the number of changes that have been made. For example, both of these examples will change 'Hello' to 'HELLO', but the first will do this with one change and the second will do it with five smaller ones:

('.+' ⎕r '\u0')'Hello'
('.' ⎕r '\u0')'Hello'

If you consider this to be one change in both cases then the only way to determine this would be to do some clever analysis of the before and after text; the following assumes you want to know how many times the pattern matched the supplied text and thus how many separate changes were made.

Both functions have the form:

R ← (patterns ⎕S transformations ⍠ options) document

Both look in the document for matches to the pattern(s), and when they find them they generate either replacement text (⎕R) or a single element of the result (⎕S) using the given transformation.

With identical patterns and options, there should be exactly the same number of elements in the result of ⎕S as there are replacements to the document by ⎕R. There are some options which can only be used with either ⎕R or ⎕S, so providing identical options to each may not be possible, plus this would be a quite inefficient way of determining the number of replacements. But for a non time-critical analysis it may well be adequate.

Alternatively, the transformations supported by ⎕R are (1) a character vector containing replacement text and simple patterns which can reference the matched text, or (2) a more powerful function call out which can do anything it wants to generate replacement text. Thus you could count the number of replacements by using the function call out to generate the text, and update a count on each invocation - although if you are currently using the non-function form you would need to implement the APL function. If you went this route you would also be able to do additional analysis about the length and positions of the matches, if you wished.
User avatar
Richard|Dyalog
 
Posts: 30
Joined: Thu Oct 02, 2008 11:11 am

Re: How many changes has ⎕R made?

Postby Budgie on Fri Sep 09, 2016 1:11 pm

This is supposed to be a simple application of searching and replacing text that has been generated by OCR. As you know, OCR is not perfect, and there are lots of false readings, perhaps influenced by a built-in spelling dictionary. When the language you are OCR-ing from is not the same as the spelling dictionary, and when the OCR has been done by somebody else so you don't have any control over it, you get even more trouble. What I am trying to do is correct large numbers of these errors automatically, before going through a proof-reading stage and manually correcting those that still exist. Knowing how many hits have been made for a particular replace operation gives an indication of how useful that particular change will be next time round.
Jane
User avatar
Budgie
 
Posts: 36
Joined: Thu Nov 26, 2009 9:22 am
Location: Beckenham

Re: How many changes has ⎕R made?

Postby DanB|Dyalog on Sat Sep 10, 2016 1:32 pm

I don't know if this can help here but have you looked into ]locate?

]locate can perform replacements and tell how many changes were made and show you, if you wish, where the changes were made.
DanB|Dyalog
 


Return to APL Chat

Who is online

Users browsing this forum: No registered users and 1 guest