Questions tagged [boolean-indexing]

27 questions
0
votes
1 answer

Pandas dataframe data validation methods

Currently using a dataframe to store information on data we've collected. Prior to submitting the data, we need to validate the data based off a list of rules. Trying to set up these validations in python, and part of the problem is readability vs.…
Sea Bacon
  • 3
  • 2
0
votes
1 answer

Pandas Categorical Ignores Boolean Slicing? (remove "unused" Categories)

Often times I have to convert even continuous data into a categorical datatype, since it helps my statistical analysis. When I apply boolean indexing (values < 11) to categorical columns, they are not sliced as expected: import matplotlib.pyplot as…
markur
  • 147
  • 8
0
votes
2 answers

Compare 2 DataFrames and drop rows that do not contain corresponding ID variables

I need to compare 2 DataFrames and drop rows in either that do not contain the corresponding IDs. As an example consider df1 and df2. df1 = pd.DataFrame({'ID':[1,2,3,4], 'Food':['Ham','Cheese','Egg','Bacon',], …
John Conor
  • 722
  • 6
  • 20
0
votes
2 answers

How to speed up pandas boolean indexing with multiple string conditions

I have a 73 million row dataset, and I need to filter out rows that match any of a few conditions. I have been doing this with Boolean indexing, but it's taking a really long time (~30mins) and I want to know if I can make it faster (e.g. fancy…
0
votes
1 answer

Advanced boolean indexing

I wanna select values by mask and changes values by use mask-array. Code: import numpy as np a = np.zeros((2, 2), dtype=(np.uint8, 3)) x = np.arange(4, dtype=int).reshape((2, 2)) mask = np.logical_and(a1 < 3, a1 > 0) a[mask] = (1, x[mask], 2) I…
0
votes
1 answer

Insert a customized series as a new column in a DataFrame with Pandas

Given this DataFrame with columns: category, Year, and Profit data = {'category':pd.Series(['A','A','A','A','A','A']), 'Year':pd.Series([1,1,3,3,3,4]), 'Profit':pd.Series([10,11,5,6,30,31])} df = pd.DataFrame(data) display(df) how…
Howard
  • 111
  • 3
0
votes
1 answer

Star (*) within Pandas boolean indexing

Because of a typo, I happened upon some Pandas DataFrame boolean indexing syntax that I'm not familiar with and I can't find any information describing what is actually happening. I was trying to retrieve a dataframe based on two conditions with an…
jb1225
  • 41
  • 3
0
votes
2 answers

Boolean Indexing numpy Array with or logical operator

I was trying to do an or boolean logical indexing on a Numpy array but I cannot find a good way. The and operator & works properly like: X = np.arange(25).reshape(5, 5) # We print X print() print('Original X = \n', X) print() X[(X > 10) & (X < 17)]…
Zioalex
  • 3,441
  • 2
  • 33
  • 30
0
votes
1 answer

pandas boolean indexing of dataframe in dictionary of data frames

So, this is probably a really simple problem, but I've not found a solution yet. I apologize for my stupidity. (I'm guessing my ignorance of terminology has impeded my searching here) I have a dictionary of dataframes (showing 2 in here, but it…
0
votes
2 answers

Numpy: Overlay Boolean Array on "True"s of other boolean array

I have a bool 2D-array A with the numbers of True being the dimension of bool 2D-array B. A = np.array([[False, True, True, False, True],[False, False, False, False, False],[False, True, True, False, True]]) B = np.array([[True, False, True],[True,…
Jonas Jo
  • 25
  • 4
0
votes
2 answers

Boolean indexing, trying to search by label with two conditions but boolean and, bitwise &, and numpy logical_and all return errors

I am trying to return the rows of a dataframe in pandas that correspond to the label I choose. For example, in my function Female, it returns all the rows in which the patient is female. For AgeRange, I have run into issues satisfying both…
user10448598
-1
votes
1 answer

How do I capture all complying values using a mask in Pandas?

Value_counts performed in one specific data frame column shows visually that there are 441 values lower than 10. When I run a mask (boolean indexing) in order to access those values it only gets 12 of the 441. I thought it was a datatype issue.…
1
2