1

i have a dataframes now it have 5 rows(in future will have more). In column names there 5 values, if those 5 names the same(their fuzz.ratio close to each other) then ok no changes needed. But there is cases where: 4 values good(their fuzz.ratio close) and 1 value different, bad. 3 values good, 2 bad, 3 values good, 1 bad and 1 bad. 2 values the same, other 2 the same, and 1 different, bad. 2 values the same, other 1 and 1 and 1 values bad. So I need dataframes where at least 2 rows the same, 3 better, 4 good, 5 the best. Here is some simple example, of course series will have row index based on that it will be easier to select needed rows.

fruits_4_1 = ['banana', 'bananas', 'bananos', 'banandos','cherry']
fruits_3_2 = ['tomato','tamato','tomatos','apple','apples']
fruits_3_1_1 = ['orange','orangad','orandges','ham', 'beef']
fruits_2_2_1 = ['kiwi', 'kiwiss', 'mango','mangas', 'grapes']
fruits_2_1_1_1 = ['kiwi', 'kiwiss', 'mango','apples', 'beefs']
for f in fruits_4_1:
    score_1 = process.extract(f, fruits_2_1_1_1, limit=10, scorer=fuzz.ratio)
    print(score_1)

I need implement logic, that will check dataframe`s series and determine what type it is 4+1\3+2 etc. And based on that will create new dataframes, with only similar rows. How do i do that?

Andrii
  • 83
  • 4

0 Answers0