0

I need to find all possible combinations of the following variables, each containing a X number of observations

Variable Obs

  • Black 1
  • Pink 2
  • Yellow 6
  • Red 15
  • Green 17

e.g. (black, pink), (black, pink, yellow), (black, pink, yellow, red), (red, green).... Order is not important, so I must delete all the combinations that contain the same elements (black, pink) and (pink, black).

Also, at the end I would need to calculate the number of total observations per each combination.

What is the fastest method, which is also less prone to errors?

I read about Tuples but I am not able to write the code myself.

user19745561
  • 145
  • 10
  • Please explain `X`. Do you mean that black can appear up to one time, but pink can appear up to two times? (i.e. black,pink,pink is also valid?) – langtang Aug 12 '22 at 13:15
  • I think I know what you mean... After you get all the combinations, you will evaluate the sum of X across the color(s) in the combination.. So, (black, pink) will be =3, while (green) will be 17, and (green, red, pink) will be 34 – langtang Aug 12 '22 at 17:39

1 Answers1

0

You can use tuples (to install ssc install tuples), like the example below. Note that I use postfile with a temporary name for the handle and temporary file for the results. After the loop is complete, I open the temporary file colors, and use gsort to sort in descending order.

tuples black pink yellow red green 
scalar black=1
scalar pink=2
scalar yellow=6
scalar red=15
scalar green=17

tempname colors_handle
tempfile colors
postfile `colors_handle' str40 colors cnt using `colors', replace
forvalues i = 1/`ntuples' {
    scalar sum = 0
    foreach n of local tuple`i' {
        scalar sum = sum + `n'
    }
    post `colors_handle' ("`tuple`i''") (sum)
}
postclose `colors_handle'
use `colors',clear
gsort -cnt
list

Output:

                            colors   cnt  
  1.   black pink yellow red green    41  
  2.         pink yellow red green    40  
  3.        black yellow red green    39  
  4.              yellow red green    38  
  5.          black pink red green    35  
  6.                pink red green    34  
  7.               black red green    33  
  8.                     red green    32  
  9.       black pink yellow green    26  
 10.             pink yellow green    25  
 11.         black pink yellow red    24  
 12.            black yellow green    24  
 13.                  yellow green    23  
 14.               pink yellow red    23  
 15.              black yellow red    22  
 16.                    yellow red    21  
 17.              black pink green    20  
 18.                    pink green    19  
 19.                   black green    18  
 20.                black pink red    18  
 21.                         green    17  
 22.                      pink red    17  
 23.                     black red    16  
 24.                           red    15  
 25.             black pink yellow     9  
 26.                   pink yellow     8  
 27.                  black yellow     7  
 28.                        yellow     6  
 29.                    black pink     3  
 30.                          pink     2  
 31.                         black     1 
langtang
  • 22,248
  • 1
  • 12
  • 27
  • Is there a way to order the output basing on the sum? Like in an ascending order, so that the combinations with the highest sum can be the last ones? Or the first ones – user19745561 Aug 21 '22 at 09:40
  • See my edit. In this exmaple, I use `postfile` (see `help postfile` for details) – langtang Aug 21 '22 at 18:24