﻿:Class regexMatch
⍝ This class generates instances based on the .Net Regex  class
    :include listsUtils    
   ⍝∇:require =\listsUtils
    :include partScan      
   ⍝∇:require =\partScan
    :include textUtils     
   ⍝∇:require =\textUtils

    ⎕io←0 ⋄ ⎕ml←1 ⋄ ⎕wx←3 ⋄ (LF CR)←2↓4↑⎕av

    ∇ genpat pattern
      :Implements constructor
      :Access public
      cPattern←'m'compileRegex pattern
      ⎕DF pattern
    ∇

    ∇ genpatopt(pattern options);arg
      :Implements constructor
      :Access public
      cPattern←options compileRegex pattern
      ⎕DF pattern
    ∇

    ∇ letters←extraNameFormingChars;r
    ⍝ Returns the extra characters used to form names in APL
    ⍝ The following varies from one APL to another but ⎕NC=0 always
    ⍝ means available and in Dyalog an invalid name has a ⎕NC of ¯1
      r←0≤⎕NC 256 1⍴⎕AV ⍝ check validity
      letters←(r/⎕AV)~,⎕AV[(⎕AV⍳'Aa')∘.+⍳26] ⍝ assume alphabets together
    ∇

    ∇ rx←{options}compileRegex pattern;⎕USING;rxopt;opt
    ⍝ This function returns a .Net regex object from a pattern
    ⍝ 'options' are
    ⍝ M: Multiline search, S: Singleline search, I: case Insensitive
     
      ⎕USING←'System.Text.RegularExpressions,system.dll'
     
    ⍝ Determine the options
      :If 0=⎕NC'options' ⋄ options←'' ⋄ :EndIf
      opt←opt∨(⍴opt)↑~∨/opt←∨⌿2 3⍴'msiMSI'∊options ⍝ Multiline default
      rxopt←+/opt/RegexOptions.(Multiline Singleline IgnoreCase)
      rx←⎕NEW Regex,⊂(APLPattern pattern)rxopt
    ∇

    ⍝ ------- Instance methods -------

    ∇ pattern←APLPattern pattern;⎕USING;⎕IO;e;sl;nobsl;w;nfc;cut
    ⍝ This function is used to return a pattern a la APL
     
      ⎕IO←0 ⋄ cut←sl>¯1↓0,sl←'\'=pattern
      nobsl←¯1↓1,sl⍲cut pUes sl         ⍝ where there is no unescaped \ before
      e←(⍴,pattern)↑'∧'=1↑pattern       ⍝ BOL anchor
     
    ⍝ There is a known problem with some ASCII characters like ∧:
    ⍝ ∧ means "start of string" at the beginning or "NOT" inside brackets in which case
    ⍝ it must be converted to its ASCII counterpart
      ((e∨¯1⌽nobsl∧'[∧'⍷pattern)/pattern)←⎕AV[235] ⍝ map ∧ to ASCII caret if no translation done
     
    ⍝ This version accepts sequences ⍺ & ⍵ as shorcuts for "token" and "number"
    ⍝ Like any metacharacter they can be disabled using '\'
      w←nobsl∧'⍵'=pattern ⍝ find numbers (⍵) before ⍺ tokens
      nfc←'a-zA-Z',extraNameFormingChars
      :If 1∊e←nobsl∧'⍺'=pattern
     ⍝ in Dyalog a name must start with a letter followed by 0 or more alphanums
     ⍝ or it can be ⍺, ⍺⍺, ⍵ or ⍵⍵. It must not be preceded by : either (e.g. :IF)
          (e/pattern)←⊂'(?>(?<!\s:)(?<![⎕0-9',nfc,'])[',nfc,'][',nfc,'0-9]*|⍺⍺|⍺|⍵⍵|⍵)'
      :EndIf
    ⍝ See 'GroupingConstructs' for details
      (w/pattern)←⊂'(?>(?<![',nfc,'0-9])¯?(?>\d+\.?\d*|\d*\.?\b\d+)(?>[eE]¯?\d+)?)' ⍝ atomic grouping
      pattern←0⊃,,/pattern,⊂'' ⍝ cover empty case
      pattern←(~'\⍺'⍷e)/e←(~'\⍵'⍷e)/e←pattern ⍝ remove unnecessary \s
    ∇

    ∇ r←{options}findMatch string;e;mo;np;msg
    ⍝ This function is used to find where a pattern is found in a string.
    ⍝ It returns an int list indicating the start & length of each match.
    ⍝ Each subexpression specified is shown for each match (3D result).
     
      :Access public
      {}⎕FX,⊂'z←options z'
     
      :Trap 90
        ⍝ Change CR into NL (needed for search)
          mo←cPattern.Match⊂string charReplace CR LF
          r←0⍴⊂0 2⍴0
        ⍝ Execute the expression until nothing found
          :While mo.Success
              np←mo.Groups.Count
              r←r,⊂↑mo.Groups[⍳np].(Index Length)
              mo←mo.NextMatch
          :EndWhile
     
          r←↑r ⍝ disclose results
      :Else
          msg←{256>⍴⍵:⍵ ⋄ '...',¯252↑⍵}(e⍳CR)↑e←⍕⎕EXCEPTION
          msg ⎕SIGNAL 11
      :EndTrap
    ∇

    ∇ text←showMatches string
      :Access public
      text←displayMatch cPattern string
    ∇

    ∇ r←{options}displayMatch(pattern string);⎕USING;⎕IO;⎕ML;⎕WX;e;mo;msg;cpattern;⎕TRAP;lno;eachline;offset;marks;lines;startpos;ind;len;lel;move;dec;hits;n;mask;tl
    ⍝ This function is used to show where a pattern is found in a string.
    ⍝ It displays each line with carets under where the match is made
    ⍝ This fn is ⎕io independent.
    ⍝ Pattern may be compiled
     
      ⎕ML←3 ⋄ ⎕IO←0 ⋄ ⎕WX←3
      :If 0=⎕NC'options' ⋄ options←'l' ⋄ :EndIf
     
    ⍝ Change CR into NL (needed for search)
      string←intoCR string
      startpos←0,1+e/⍳⍴e←string=CR        ⍝ where each line starts
      lel←1+∊⍴¨eachline←string splitOn CR ⍝ eachline and its length
     
    ⍝ Check that the pattern is compiled
      ⎕TRAP←90 'C' '→err90'
      :If 2 0∨.=10|⎕DR,cpattern←pattern
          cpattern←options compileRegex pattern
      :EndIf
      mo←cpattern.Match⊂string charReplace CR LF
      lines←marks←r←⍴hits←0
    ⍝ Execute the expression until nothing found
      :While mo.Success
          (ind len)←mo.Groups[0].(Index Length)
          :If len>0 ⍝ no point tracking 0 length matches
              lines←lines,lno←¯1+⊃fromTo/+⌿startpos∘.≤ind+0,len
              tl←len+offset←ind-startpos[lno[0]] ⍝ # spaces before
              mask←(0 1/⍨offset,len),⍳tl=¯1↑n←+\lel[¯1↓lno]
              marks←marks,mask splitAt n
          :EndIf
          mo←mo.NextMatch
          hits+←1
      :EndWhile
     
    ⍝ All lines # and their marks have been gathered
      marks←\∘'∧'¨∨⌿∘⊃¨(1+lines)⊂marks ⍝ 'lines' must be >0
      tl←marks∨.≠¨' '
      lno←tl/∪lines
      eachline←eachline[lno]
      move←''
      :If ∨/'lL'∊options
          dec←'[',⊃,∘'] '¨⍕¨lno ⍝ add line #s?
          move←(¯1↑⍴dec)↑''
          eachline←eachline,¨⍨↓dec
      :EndIf
      r←r,∊CR,¨eachline,[0.2]move∘,¨tl/marks
     
    ⍝ Add number found
      r←((⍕hits),' match',(2×hits=1)↓'es found'),r
      →0
     
     err90:
      msg←{256>⍴⍵:⍵ ⋄ '...',¯252↑⍵}(e⍳CR)↑e←⍕⎕EXCEPTION
      msg ⎕SIGNAL 11
    ∇

    ∇ r←Replace(text by)
    ⍝ Same as the Regex Replace function
      :Access public
      r←cPattern.Replace text by
    ∇

:EndClass ⍝ regexMatch  $Revision: 1002 $ 