I am trying to make use of the levenshtein distance in my join condition.
Since the sqlalchemy doesn't provide the implementation within func module, I set the method stringdist.rdlevenshtein_norm to func.rdlevenshtein_norm and used it in my sqlalchemy join, like below.
import stringdist
import sqlalchemy as sa
sa.func.rdlevenshtein_norm = stringdist.rdlevenshtein_norm
on_cond = [func.rdlevenshtein_norm(rel_a[str_col_a].column, rel_b[str_col_b].column) < 0.2]
sa.join(rel_a.table, rel_b.table, sa.and_(*on_cond),
isouter=outer, full=full)
But the function stringdist.rdlevenshtein_norm
expects 2 strings as input and hence not working well with sqlalchemy to compare all the values in the 2 joining columns.
So the error I am getting for on_cond is , TypeError('argument 1 must be str, not Column',)
What am I doing wrong here? How can I make sqlalchemy use this function while performing join operation?