Avoiding redundancy when working with multiple similar functions

Question

That question might sounds completely stupid but since I have 0 experience with such problem, then I believe it might be worth trying to ask.

The situation that I am facing is that I have multiple functions, and each of them calls one dataframe, and then creates several more dataframes to perform some operations. Yes, I could utilize perhaps OOP, however I don't think that would makes things more readable for my specific case.

The code bellow is an example that I made that could be interesting to visualize what I mean, it does not really has any purpose other than illustrate the problem. Naturally the situation that I am facing is more complicated and involves way more dataframes, that why attributing a variable to each partitioned dataframe and using it as argument to the function is not an option.

dfdata = pd.DataFrame({'Column A': [300,300,450,500,500,750,600,300, 150],'Column B': [1,1,0,1,0,1,0,0,1],'Column C': ['R','C','R','C','Q','C','R','Z','Z']})

def foo1(df):
    df_1 = df.loc[df['Column B'] == 1]
    df_0 =df.loc[df['Column B'] == 0]
    df_x = df_1['Column B']*2
    return df_x

def foo2(df):
    df_1 = df.loc[df['Column B'] == 1]
    df_0 =df.loc[df['Column B'] == 0]
    df_y = df_0['Column B']*2
    return df_y

def foo3(df):
   df_1 = df.loc[df['Column B'] == 1]
   df_0 =df.loc[df['Column B'] == 0]
   df_z = df_1['Column B']*3
   return df_z

So, to sum up, any ideas how to makes things less repetitive and smarter without applying OOP?

Your code does not make much sense. Due to construction of your `df_1`, its `Column B` always contains only `1` in each row (in each of your `foo` functions) . Hence, `df_1['Column B']*3` is a bit overcomplicating things. Also, `foo1()` and `foo2()` are equal. So probably, your code is buggy and not doing what you want to do at all. Could you describe what you actually want to achieve in your functions? — Jonathan Scholbach, Nov 05 '19 at 15:18
@jonathan.scholbach Yes, it is indeed weird. As I mention here : "The code bellow is an example that I made that could be interesting to visualize what I mean, it does not really has any purpose other than illustrate the problem." — Marc Schwambach, Nov 05 '19 at 15:20

Jonathan Scholbach · Answer 1 · 2019-11-05T15:37:07.657

It is good that you are looking for ways to avoid or remove repetition in your code. Actually, this is a very important principle in programming that even got its own abbreviation: DRY (as in "Don't Repeat Yourself"). And you are right, this has nothing to do with Object Oriented Programming. :)

When trying to get rid of repetition, it is generally a good strategy to identify the variable parts of the repetitive code, i.e. the parts that differ in each of your otherwise repetitive code blocks. Then try to write functions which accept these variables as parameters. That way you generalise your functions, and you identify the parametrised version to be a special case of the more general function. For instance, working with your example, keeping the parts which do not make much sense in it:

def foo(df, column_name="Column B", factor=2, filter_value=1):
    df_1 = df.loc[df[column_name] == filter_value]
    return df_1[column_name] * factor

Avoiding redundancy when working with multiple similar functions

1 Answers1