Is there an efficient/analytical solution to find probabilities of each case in a power set?

Question

In my capacity as a design engineer, I needed to size equipment to feed compressed air to a set of dust collectors. These dust collectors require 10 minutes (approximately) of "pulsing" air every few hours to clean the filter medium. The pulses last 0.1 s and they happen every 10 s while the pulsing lasts. The volume of air used during each pulse varies by dust collector. The equipment I am designing will feed many different types of dust collectors in different areas. For example, in one area, equipment might feed 2 dust collectors that each require 3 scf* per pulse, and also feed 1 dust collector that requires 5 scf per pulse. In another area, those numbers will be totally different. I would like to be sure that I design the equipment in every area to handle 99% or 99.9% of cases.

For each area, a power set describes each flow case that I might see. In the simple example above where I only have three dust collectors, the power set is (), (3,), (3,), (5,), (3, 3), (3, 5), (3, 5), (3, 3, 5). I wrote the following code in python. It calculates the probability of each case of the power set, and then selects the case with a total flow under which 99% and 99.9% of flow cases will probably occur. Since I could not figure out an analytical way to solve for the probability of each case, I ended up using a double for-loop over the power set to deal with the probability of each case overlapping with another. This gets really slow for cases with more than about 5 dust collectors.

import numpy as np
import pandas as pd
from numpy.random import rand
from itertools import chain, combinations

# placeholder array to simulate dust collector flows in an area
flows = np.round(rand(int(np.round(rand() *10, 0))),2)
flows = np.round(rand(7),2)

# aproximate probability that one dust collector will be pulsing at any time
p = 0.0017 # based on 6 pulse/min while pulsing, 10 min pulsing/hr, 0.1 s/pulse

# function to produce a numpy array containing the powerset of actual values
def numPowerset(iterable):
    "numPowerset([1,2,3]) --> [[], [1], [2], [3], [1,2], [1,3], [2,3], [1,2,3]]"
    s = list(iterable)
    c = list(chain.from_iterable(combinations(s, r) for r in range(len(s) +1)))
    return np.array(c, dtype='object')

# function to produce numpy array containing unique values to represent actuals
def idPowerset(iterable):
    "idPowerset([1,2,3]) --> [[], [1], [2], [3], [1,2], [1,3], [2,3], [1,2,3]]"
    s = np.linspace(0, len(list(iterable)) -1, len(list(iterable)))
    c = list(chain.from_iterable(combinations(s, r) for r in range(len(s) +1)))
    return np.array(c, dtype='object')

# creating the dataframe where I want to see my results 
df = pd.DataFrame(numPowerset(flows), columns=['Num. Combination'])
df['ID Combination'] = idPowerset(flows)

# calculating the probability of each event, inclusive of the probability that 
# the event occurs as part of another event (overlapping events)
df['Probability'] = p**df['ID Combination'].str.len()

# removing the probability of overlapping events
for i in reversed(df.index): 
    for j in df.index:
        if (np.size(df.loc[j,'ID Combination']) 
            <= np.size(df.loc[i,'ID Combination'])):
               pass 
        elif (df.loc[i,'ID Combination'] in 
              list(idPowerset(df.loc[j,'ID Combination']))):
                df.loc[i,'Probability'] -= df.loc[j,'Probability']
                
# totalling flows
df['Total Flow'] = list(map(sum, df['Num. Combination']))

# summing probabilities for each possible flow rate
df = df.groupby('Total Flow').sum(numeric_only=True)

# normalizing probability (because I don't really care about the zero flow case)
df.drop(index = 0, inplace=True)
df['Normalized Probability'] = df['Probability']/df['Probability'].sum()

# getting cumulative normalized probability and flows that account for more than most cases
df = df.sort_values('Total Flow').reset_index() # just making sure I get the right value on top
df['Cumulative N. Prob.'] = df['Normalized Probability'].cumsum()
design99 = df[df['Cumulative N. Prob.']>0.99].reset_index().at[0,'Total Flow']    
design999 = df[df['Cumulative N. Prob.']>0.999].reset_index().at[0,'Total Flow']

# viewing results
print()
print(df)
print('Sum of probabilities is: %f'%sum(df['Probability']))
print('Sum of normalized probs. is: %f'%sum(df['Normalized Probability']))
print('Design 99% flow is:', design99)
print('Design 99.9% flow is:', design999)

I am sure that an analytical solution to this statistical problem would fix the speed issues here. Solutions that improve performance without an analytical solution would also be welcome.

*60°F and 1 atm reference conditions

Is there an efficient/analytical solution to find probabilities of each case in a power set?

0 Answers0