3

I am trying to make use of the levenshtein distance in my join condition.

Since the sqlalchemy doesn't provide the implementation within func module, I set the method stringdist.rdlevenshtein_norm to func.rdlevenshtein_norm and used it in my sqlalchemy join, like below.

import stringdist
import sqlalchemy as sa

sa.func.rdlevenshtein_norm = stringdist.rdlevenshtein_norm

on_cond = [func.rdlevenshtein_norm(rel_a[str_col_a].column, rel_b[str_col_b].column) < 0.2]

sa.join(rel_a.table, rel_b.table, sa.and_(*on_cond),
                         isouter=outer, full=full)

But the function stringdist.rdlevenshtein_norm expects 2 strings as input and hence not working well with sqlalchemy to compare all the values in the 2 joining columns.

So the error I am getting for on_cond is , TypeError('argument 1 must be str, not Column',)

What am I doing wrong here? How can I make sqlalchemy use this function while performing join operation?

Vinay
  • 952
  • 2
  • 10
  • 27
  • can't I create a generic method (sa.func.rdlevenshtein_norm) to implement the rdlevenshtein_norm? – Vinay Jul 26 '19 at 18:50
  • No it doesn't.. – Vinay Jul 26 '19 at 19:10
  • Update: I created a generic function `class rdlevenshtein_norm(GenericFunction): type = stringdist.rdlevenshtein_norm` Then called it as, `func.rdlevenshtein_norm(rel_a[str_col_a].column, rel_b[str_col_b].column)` This gives me an error `TypeError('function takes exactly 2 arguments (0 given)',) ` – Vinay Jul 26 '19 at 19:16

0 Answers0