I have just started to learn python and was working on a project to clean up text from an actual English dictionary as my source and load the data in an SQL table under 4 columns - id, word, definition, type. Type here refers to the classification of the word as either a verb or a noun or adjective etc. This is the type of data that I had in a text file.
Aback - adv. take aback surprise, disconcert. [old english: related to *a2]
Abm - abbr. Anti-ballistic missile
Abnormal - adj. Deviating from the norm; exceptional. abnormality n. (pl. -ies). Abnormally adv. [french: related to *anomalous]
From the above text file and after doing a bit of cleaning up, I arrived at an SQL table as shown in this image.
From the definitions column, using a regex, I picked up words ending with a "." and manually went through the python dictionary of most repeated words which describe a word (Regular expression not finding all the results). I then loaded them in another SQL table in the same DB and also have them as a list. (list = [v., adj., adv., n., abbr., prefix, suffix, prep.]).
Now, I am trying to find the elements of this list in the 'definition' column of the SQL table and if it is found add the element to the 'type' column of the same table but not as v. or adj. but as its primary key.
I am unable to match elements of this list to the contents of the 'definition' column and although I haven't tried I don't know how, after a match, I can insert the type of word's primary key here in this column. The final table should look something like this
ID WORD DEFINITION TYPE
1 abandon Blah blah 1, 3, 5
Under column type, 1 would represent verb (v.), 3 would represent (adj.) and so on.
Here's what I have tried so far.
Approach 1: Using a function and regex
# Defining the function
def typ_fun(defin,regex):
p = re.compile('[regex]$')
match = re.search(p, defin)
if match:
print(regex)
#cur_1. is the cursor to connect to the SQL DB
cur_1.execute('''SELECT id,definition FROM Words''')
for row in cur_1:
for a in list:
typ_fun(row[1],a)
The result, in this case, is that the 'list' is being printed in order as it is 3 times.
Approach 2: Using if & any From stackoverflow
cur_1.execute('''SELECT id,definition FROM Words''')
for row in cur_1:
if any(x in row[1] for x in lst):
print(x)
This gives the error
NameError: name 'x' is not defined
While I do not know the exact solution, I think that the second approach would be unfavourable as in it I am losing the 'id' of the definition which would not allow me to map it to the appropriate row in the table.