This isn't a specific issue with code and more of an open, conceptual question, so I hope it's in the right place.
I have a pandas dataframe, and I often subset data on bounding times and other optional variables, here frequency. The frequency has discrete values, so I can select data from a single or multiple channels. The function I have looks something like this:
def subset_data(data, times, freq=None):
sub_data = data.loc[data['time'].between(*times), :]
if freq is not None:
if isinstance(freq, int):
sub_data = sub_data.loc[sub_data['frequency'] == freq, :]
elif isinstance(freq, tuple):
sub_data = sub_data.loc[sub_data['frequency'].between(*freq), :]
return sub_data
I wanted to modify the second condition to be a more general check for any numeric type, and I found this question - What is the most pythonic way to check if an object is a number?. The accepted answer made me question my approach here and its validity in general. The last point, in particular:
If you are more concerned about how an object acts rather than what it is, perform your operations as if you have a number and use exceptions to tell you otherwise.
I interpret this as implying I should do something like this
def subset_data(data, times, freq=None):
sub_data = data.loc[data['time'].between(*times), :]
if freq is not None:
try:
if isinstance(freq, tuple):
sub_data = sub_data.loc[sub_data['frequency'].between(*freq), :]
elif isinstance(freq, int):
sub_data = sub_data.loc[sub_data['frequency'] == freq, :]
except TypeError:
print('sub_data filtered on time only. freq must be numeric.')
return sub_data
or
if isinstance(freq, tuple):
sub_data = sub_data.loc[sub_data['frequency'].between(*freq), :]
elif isinstance(freq, int):
sub_data = sub_data.loc[sub_data['frequency'] == freq, :]
else:
raise TypeError('freq must be tuple or numeric')
but would be interested to know if that's anything close to the consensus.
The original is also missing some validation for completeness - I'm too lazy to write this in my own code and feel like it adds unnecessary clutter if I assume that I'll be the only one using it and have a priori knowledge of the types. If this was not the case, I would include:
if isinstance(freq, int):
sub_data = sub_data.loc[sub_data['frequency'] == freq, :]
elif isinstance(freq, tuple) and len(freq) == 2:
if isinstance(freq[0], int) and isinstance(freq[1], int):
sub_data = sub_data.loc[sub_data['frequency'].between(*freq), :]
Is the practice of checking explicitly for the object type and attributes, and this approach to validation in general, appropriate in Python, or is my knowledge lacking somewhere? Maybe everything could be written more concisely, and I'm missing something. There's technically two questions in this post, but I hope the general, overarching concept is clear enough to be useful for others and allow for some answers.