segs ← tag ##.htx html                      ⍝ Extract html segments.

Extracts [tag]-tagged segments from character array [html].

Right argument [html] may be:

    - a character vector, possibly containing linefeed characters, or
    - a character matrix, or
    - a vector of character vectors (as delivered by →getfile←).

If [tag] starts with a '<' character, the <begin> and </end> tags are themselves
included  in  the result, otherwise they are omitted. For aesthetic reasons, the
closing '>' may also be included in [tag], but is ignored.

Technical notes:

The coding is an example of "programming with functions". Notice that nearly all
of the local names refer to functions, rather than to data arrays.

Examples:

    bold←'<b>this</b> and <b>that</b>'

    disp   'b' htx bold             ⍝ extract <bold> text.
┌→───┬────┐ 
│this│that│
└───→┴───→┘ 

    disp '<b>' htx bold             ⍝ .. including tags.
┌→──────────┬───────────┐ 
│<b>this</b>│<b>that</b>│
└──────────→┴──────────→┘ 

    htm                             ⍝ character vector (with linefeeds).
<html>
  <body>
    <table>
      <tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr>
      <tr><td>Guys</td><td>60</td><td>40</td></tr>
      <tr><td>Dolls</td><td>20</td><td>80</td></tr>
    </table>
  </body>
</html>

      disp 'table'htx htm            ⍝ extract table.
┌→────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ 
│ <tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr> <tr><td>Guys</td><td>60</td><td>40</td></tr> <tr><td>Dolls</td><td>20</td><td>80</td></tr> │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────→┘ 

    disp '<table>'htx htm           ⍝ extract table with tags.
┌→───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ 
│<table> <tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr> <tr><td>Guys</td><td>60</td><td>40</td></tr> <tr><td>Dolls</td><td>20</td><td>80</td></tr> </table>│
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────→┘ 

    disp 'tr'htx htm                ⍝ extract table rows.
┌→──────────────────────────────────────────┬───────────────────────────────────┬────────────────────────────────────┐ 
│<td>%</td><td>Eye Poke</td><td>Kumquat</td>│<td>Guys</td><td>60</td><td>40</td>│<td>Dolls</td><td>20</td><td>80</td>│
└──────────────────────────────────────────→┴──────────────────────────────────→┴───────────────────────────────────→┘ 

    disp 'td'htx htm                ⍝ extract table data.
┌→┬────────┬───────┬────┬──┬──┬─────┬──┬──┐ 
│%│Eye Poke│Kumquat│Guys│60│40│Dolls│20│80│
└→┴───────→┴──────→┴───→┴─→┴─→┴────→┴─→┴─→┘ 

    disp ↑'td'∘htx¨'tr'htx htm      ⍝ extract table data per row.
┌→────┬────────┬───────┐ 
↓  %  │Eye Poke│Kumquat│
├────→┼───────→┼──────→┤ 
│Guys │   60   │  40   │
├────→┼───────→┼──────→┤ 
│Dolls│   20   │  80   │
└────→┴───────→┴──────→┘ 

See also: Line_vectors html getfile

Back to: contents

Back to: Dyalog APL

Trouble seeing APL font?