0

To help learners of Braille, I want to filter a list of words to find only those that contain the both the letters 'd' and 'n'. I am using the regex engine present in BBEdit 10.5.13. I have a file contain a list of words, one word per line.

Here is a regex which matches every line, which is of course not what I want.

\w*?(d)?(n)?(?(1)\w*?n|(?(2)\w*?d))\w*

The logic that I imagine is:

\w*?   Match all the letters before the first 'd' or 'n', if there are any
(d)?   If there is a 'd' before the first 'n', capture it
(n)?   If there is an 'n' before the first 'd', capture it
(?(1)  If a 'd' was captured...
\w*?n  ... then match all characters up to the first 'n'
|(?(2) Else if an 'n' was captured...
\w*?d  ... then match all characters up to the first 'd'
))\w*  Continue the match until the end of the word

Obviously, my logic and the logic of my regex are different, since this matches every word whether it contains a 'd' or an 'n' or not. Any help with correcting my logic will be greatly appreciated.

Here's a short extract from the list, containing desired 2 matches: "balding" and "band".

bald
balding
bale
baling
balk
balked
balking
balm
bam
ban
band
bane
James Newton
  • 6,623
  • 8
  • 49
  • 113

2 Answers2

1

One simple way:

\w*[Dd]\w*[Nn]\w*|\w*[Nn]\w*[Dd]\w*

This simple regex should works in any flavor due to the left-most match rule. It should highlight the whole word.

If _ and digits are present in the text, change all \w to [a-zA-Z].

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • Simple solutions are so neat! Actually, since my list is all in lowercase, this works for me too: `\w*d\w*n\w*|\w*n\w*d\w*`. However, I'd really like to understand how to use conditionals correctly, so that I can write more complex regexes for other circumstances. – James Newton Nov 03 '14 at 18:24
  • @JamesNewton: Forcing to use conditional on such a simple problem is not really a good way to learn about conditional regex, IMHO. I don't find myself using conditional regex much. But you can search around on SO for real example where conditional regex is necessary. – nhahtdh Nov 03 '14 at 18:29
  • I appreciate your position. My attitude is: if I can understand how conditionals work on a simple problem, it will be easier to apply them to a more complex problem. – James Newton Nov 03 '14 at 18:43
1

This will match exactly what you're looking for.

^([a-nA-N]*[Dd][a-nA-N]*[Nn][a-nA-N]*|[a-nA-N]*[Nn][a-nA-N]*[Dd][a-nA-N]*)$
#Or for lowercase just:
^([a-n]*[Dd][a-n]*[Nn][a-n]*|[a-n]*[Nn][a-n]*[Dd][a-n]*)$

Here's a screenshot of it working.

isaacsloan
  • 885
  • 6
  • 18
  • Question title was worded like this: "Regex to find all words which contain both 'd' and 'n' where other letters are from a to n" – isaacsloan Nov 03 '14 at 19:08
  • Braille uses 6 dots to create a pattern. The first 10 letters (a-j) use the top 4 dots. The next 10 letters (k-t) repeat the first 10 patterns with an extra dot in position 5. I am making an application to help learn the Braille alphabet. I want to choose words to practise each new letter. When you are learning the letter 'n' (which is like 'd' with an extra dot), you will only know the Braille patterns for the letters a to n. (I have used 'd' and 'n' as an example of the regex that I need. I also use 'e' and 'o', 'f' and 'p', and so on). – James Newton Nov 03 '14 at 20:09