RegEx: identifying list markers
Forum rules
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !
6 posts
• Page 1 of 1
RegEx: identifying list markers
I have the following problem: I want to count white space characters and list identifiers for list items. The following table shows input and desired result:
For ordered lists this works:
For unordered lists I expected this to work:
That's fine but this is not:
It seems that the {1} is ignored.
It's not a missing feature: it works for ordered lists, restricting the number of digits to a minimum of 1 and a maximum of 9.
I wonder what is going on here...
- Code: Select all
2 ← '* Item'
3 ← ' * Item'
4 ← ' * Item'
5 ← ' * Item '
⍬ ← ' ** Item' ⍝ No hit: invalid
⍬ ← ' # Item' ⍝ No hit: invalid
3 ← ' + Item'
3 ← ' - Item'
3 ← '1. Item
4 ← '11. Item
5 ← ' 11) Item
6 ← ' 11) Item
⍬ ← ' 1234567890) Item ⍝ no hit: max 9 digits
For ordered lists this works:
- Code: Select all
⍬≡'\s*?\b\d{1,9}\b[.)]\s+' ⎕S 1⊢' 1234567890. Item'
1
'\s*?\b\d{1,9}\b[.)]\s+' ⎕S 1⊢' 123. Item'
9
For unordered lists I expected this to work:
- Code: Select all
'\s*?([-*+]{1})\s+'⎕S 1⊢' * Item'
6
⍬≡ '\s*?([-*+]{1})\s+'⎕S 1⊢' # Item'
1
That's fine but this is not:
- Code: Select all
'\s*?([-*+]{1})\s+'⎕S 1⊢' ** Item'
4
It seems that the {1} is ignored.
It's not a missing feature: it works for ordered lists, restricting the number of digits to a minimum of 1 and a maximum of 9.
I wonder what is going on here...
-
kai - Posts: 137
- Joined: Thu Jun 18, 2009 5:10 pm
- Location: Hillesheim / Germany
Re: RegEx: identifying list markers
The {1} is unnecessary, having any character (or set) means ONE already so
The engine worked fine. When looking at ' ** Item' it searched for spaces from the beginning, then a star then another space and that failed because there was another star right after it.
It then tried matching starting at the 2nd character from the beginning and that failed for the same reason.
It then tried at the 3rd char and that too failed for the same reason.
At the 4th char (the 2nd star) the engine matched NO spaces and a single star and the following spaces and a match was found, matching the star and the following 3 spaces (4 in total).
I think what you wanted was a single match FROM THE BEGINNING and you should have used the ^ at the beginning for that, as in
- Code: Select all
'\s*?([-*+]{1})\s+'
- Code: Select all
'\s*?([-*+])\s+'
The engine worked fine. When looking at ' ** Item' it searched for spaces from the beginning, then a star then another space and that failed because there was another star right after it.
It then tried matching starting at the 2nd character from the beginning and that failed for the same reason.
It then tried at the 3rd char and that too failed for the same reason.
At the 4th char (the 2nd star) the engine matched NO spaces and a single star and the following spaces and a match was found, matching the star and the following 3 spaces (4 in total).
I think what you wanted was a single match FROM THE BEGINNING and you should have used the ^ at the beginning for that, as in
- Code: Select all
'^\s*?[-*+]\s+'
- DanB|Dyalog
Re: RegEx: identifying list markers
How many years does it take to master this stuff?
Thanks a lot Dan.
Thanks a lot Dan.
-
kai - Posts: 137
- Joined: Thu Jun 18, 2009 5:10 pm
- Location: Hillesheim / Germany
Re: RegEx: identifying list markers
How long does it take to master APL? :-)
After several years of often scratching my head I'm reasonably comfortable with regex: you have to me if you're fiddling with tools which generate web pages, or scan for spam, which I seem to have done a lot recently. Fortunately there a a lot of resources covering regular expressions on the Internet.
Once I got over the fact that this isn't a linear set of symbols like an APL function, it became much easier.
After several years of often scratching my head I'm reasonably comfortable with regex: you have to me if you're fiddling with tools which generate web pages, or scan for spam, which I seem to have done a lot recently. Fortunately there a a lot of resources covering regular expressions on the Internet.
Once I got over the fact that this isn't a linear set of symbols like an APL function, it became much easier.
- crishog
- Posts: 61
- Joined: Mon Jan 25, 2010 9:52 am
Re: RegEx: identifying list markers
Just read this in an excellent book ("Regular Expressions Cookbook" by Jan Goyvaerts & Steven Levithan):
When confronted with a problem a guy thought "I am going to solve this with Regular Expressions".
Now he has two problems.
-
kai - Posts: 137
- Joined: Thu Jun 18, 2009 5:10 pm
- Location: Hillesheim / Germany
Re: RegEx: identifying list markers
I've been using regular expressions for over 25 years; I feel I have only scratched the surface of what is possible. Indeed, one of the early bug reporting systems used for Dyalog APL was written as a small number of UNIX scripts, each of which had one or two highly complex uses of regular expressions and the UNIX command sed.
What I have learned is that often it is more advantageous to use multiple regular expressions to massage the data in stages, rather than trying to use one more complex expression to do everything; it might be slightly slower, but it's considerably easier for anyone (including me) to work out what is going on. In Kai's example I might be inclined to use regular expressions to either extract valid list items, or filter out bad ones, and then use APL to calculate the correct result.
What I have learned is that often it is more advantageous to use multiple regular expressions to massage the data in stages, rather than trying to use one more complex expression to do everything; it might be slightly slower, but it's considerably easier for anyone (including me) to work out what is going on. In Kai's example I might be inclined to use regular expressions to either extract valid list items, or filter out bad ones, and then use APL to calculate the correct result.
-
AndyS|Dyalog - Posts: 257
- Joined: Tue May 12, 2009 6:06 pm
6 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group