0

I have an MS Access DB along with a function (thanks @MPękalski) which is called by a query in the join

create table OSS_File(
    id number,
    Kw text(255));

insert into OSS_File(ID,kw) values(1,"black or white");
insert into OSS_File(ID,kw) values(2,"green");
insert into OSS_File(ID,kw) values(3,"blue berry");
insert into OSS_File(ID,kw) values(4,"blueberry");
insert into OSS_File(ID,kw) values(5,"bluegreen");
insert into OSS_File(ID,kw) values(6,"yellow");

create table restricted_words(
    restricted_word text(255));    

insert into restricted_words(restricted_word) values("\bblack\b");
insert into restricted_words(restricted_word) values("^green$");
insert into restricted_words(restricted_word) values("^blue[ ]?berry$");
insert into restricted_words(restricted_word) values("yellow");

Function my_regexp(ByRef sIn As String, ByVal mypattern As String) As String
   Dim r As New RegExp
    Dim colMatches As MatchCollection
    With r
        .Pattern = mypattern
        .IgnoreCase = True
        .Global = False
        .MultiLine = False
        Set colMatches = .Execute(sIn)
    End With
    If colMatches.Count > 0 Then
        my_regexp = colMatches(0).Value
    Else
        my_regexp = ""
    End If
End Function

SELECT restricted_words.restricted_word, OSS_File.ID, OSS_File.kw
FROM restricted_words INNER JOIN OSS_File 
ON restricted_words.restricted_word=my_regexp(nz(OSS_File.kw),restricted_words.restricted_word);

I would have expected to get all but ID #5 back however I only get #6 back. If however I remove the ^,$, or \b I rewrite the function like

.Pattern = "^" & mypattern & "$" 
or 
.Pattern = "\b" & mypattern & "\b"

I can get back results (except for "blue[ ]?berry" due to the [ ]?).

Why when the entire regular expression is stored in the table does the mach fail?

Community
  • 1
  • 1
Aba
  • 584
  • 6
  • 11
  • "\b" might be the backspace character. In general, you have to escape all escapes, or is Vba different? And what is the empty char class `[ ]` ? –  Jan 16 '14 at 19:51
  • Try inserting a `Msgbox mypattern` before the `with` block in order to double-check what it contains. – Bernard Saucier Jan 16 '14 at 19:58
  • @sln [ ]? is a single non-required space to match both blueberry and blue berry (I don't write the specs I just have to program them :)) – Aba Jan 16 '14 at 20:04
  • @sln \\bblack\\b isn't any better. – Aba Jan 16 '14 at 20:10
  • @BernardSaucier I used Debug.Print r.Pattern and it was correct – Aba Jan 16 '14 at 20:11
  • @Aba - Well if you mean to have backspace `"\b"`'s in the string, then "\\bblack\\b" isn't what you want. –  Jan 17 '14 at 17:53

2 Answers2

3

Your join's ON clause ...

ON
restricted_words.restricted_word
=
my_regexp(nz(OSS_File.kw),restricted_words.restricted_word);

restricted_word is a regular expression pattern such as "^blue[ ]?berry$"

On the right side of that join, you're asking for the function's return value, such as "blue berry".

The pattern does not match the function's return value based on that pattern.

Consider a different approach.

FROM restricted_words AS r, OSS_File AS o
WHERE my_regexp_test(o.kw, r.restricted_word) = True

Here is that function tested in the Immediate window.

? my_regexp_test("blue berry", "^blue[ ]?berry$")
True
? my_regexp_test("blueberry", "^blue[ ]?berry$")
True
? my_regexp_test("strawberry", "^blue[ ]?berry$")
False

Function my_regexp_test(ByRef sIn As String,
        ByVal mypattern As String) As Boolean

   Dim r As New RegExp
   With r
        .pattern = mypattern
        .IgnoreCase = True
        .Global = False
        .Multiline = False
    End With
    my_regexp_test = r.Test(sIn)
End Function
HansUp
  • 95,961
  • 11
  • 77
  • 135
  • SELECT DISTINCT o.ID, o.kw FROM restricted_words as r, OSS_File as o WHERE (((my_regexp_test(nz(o.kw),r.restricted_word))=True)); works as advertised. However, I want which pattern flagged the keyword (kw) SELECT r.restricted_word, o.ID, o.kw FROM restricted_words r, OSS_File o WHERE (((my_regexp(nz(o.kw),r.restricted_word))=True)); returns a Cartesian product so I need to keep working at it. Thanks for setting me on the right direction – Aba Jan 16 '14 at 21:36
1

As @HansUp pointed out both sides of my "ON" never matched so what I did is that if there was a match I returned the original sIn (OSS_File.kw) string if not I returned "" (except if sIn was "" to begin with so I returned x) then I put OSS_File.kw on the left side of the on.

Function my_regexp(ByRef sIn As String, ByVal mypattern As String) As String
    Dim r As New RegExp
    Dim colMatches As MatchCollection
    With r
        .Pattern = mypattern
        .IgnoreCase = True
        .Global = False
        .MultiLine = False
        Set colMatches = .Execute(sIn)
    End With
    If colMatches.Count > 0 Then
        my_regexp = sIn
    ElseIf sIn = "" Then
        my_regexp = "x"
    Else
        my_regexp = ""
    End If
End Function

SELECT restricted_words.restricted_word, OSS_File.ID, OSS_File.kw
FROM restricted_words INNER JOIN OSS_File ON OSS_File.kw=my_regexp(nz(OSS_File.kw),restricted_words.restricted_word);

Thanks again everyone

Aba
  • 584
  • 6
  • 11