0

I have just started to learn python and was working on a project to clean up text from an actual English dictionary as my source and load the data in an SQL table under 4 columns - id, word, definition, type. Type here refers to the classification of the word as either a verb or a noun or adjective etc. This is the type of data that I had in a text file.

Aback - adv. take aback surprise, disconcert. [old english: related to *a2]
Abm - abbr. Anti-ballistic missile
Abnormal - adj. Deviating from the norm; exceptional.  abnormality n. (pl. -ies). Abnormally adv. [french: related to *anomalous]

From the above text file and after doing a bit of cleaning up, I arrived at an SQL table as shown in this image.

SQL Table I am working with currently and trying to fill the type columns

From the definitions column, using a regex, I picked up words ending with a "." and manually went through the python dictionary of most repeated words which describe a word (Regular expression not finding all the results). I then loaded them in another SQL table in the same DB and also have them as a list. (list = [v., adj., adv., n., abbr., prefix, suffix, prep.]).

Now, I am trying to find the elements of this list in the 'definition' column of the SQL table and if it is found add the element to the 'type' column of the same table but not as v. or adj. but as its primary key.

I am unable to match elements of this list to the contents of the 'definition' column and although I haven't tried I don't know how, after a match, I can insert the type of word's primary key here in this column. The final table should look something like this

ID        WORD        DEFINITION        TYPE
1        abandon      Blah blah        1, 3, 5

Under column type, 1 would represent verb (v.), 3 would represent (adj.) and so on.

Here's what I have tried so far.

Approach 1: Using a function and regex

# Defining the function
def typ_fun(defin,regex):
    p = re.compile('[regex]$')
    match = re.search(p, defin)
    if match:
        print(regex)

#cur_1. is the cursor to connect to the SQL DB
cur_1.execute('''SELECT id,definition FROM Words''')
for row in cur_1:
    for a in list:
    typ_fun(row[1],a)

The result, in this case, is that the 'list' is being printed in order as it is 3 times.

Approach 2: Using if & any From stackoverflow

cur_1.execute('''SELECT id,definition FROM Words''')
for row in cur_1:
    if any(x in row[1] for x in lst):
        print(x)

This gives the error

NameError: name 'x' is not defined

While I do not know the exact solution, I think that the second approach would be unfavourable as in it I am losing the 'id' of the definition which would not allow me to map it to the appropriate row in the table.

Ch3steR
  • 20,090
  • 4
  • 28
  • 58
Ctrl
  • 35
  • 6
  • can you print row and show its output here – Abhishek Verma May 22 '20 at 07:50
  • @AbhishekVerma Sure. `(944, 'n. belief that inanimate and natural phenomena have souls. \x7f animist n. animistic adj.') (945, 'n. (pl. -ies) spirit or feeling of hostility. ') (946, 'n. animosity, ill feeling. ') (947, 'n. negatively charged ion. \x7f anionic adj. ')` – Ctrl May 22 '20 at 08:04
  • Unable to edit my comment as it has been more than 5 mins, but each tuple is a new line – Ctrl May 22 '20 at 08:11

0 Answers0