0

I am participating in a data science competition and my final predictions would be measured via a GINI Index. It is a regression problem. I have the source code for the calculation in SAS but I dont know SAS and am not able to understand what is going on.

I want to build the same in Python. Any help would be appreciated. If someone knows Python code for this, it would help a lot.

    *define GINI;       
    %macro  gini(input=, output=, y=, py=, filter=, split_ind = );
    data indsn;
        set &input.;
        _random=ranuni (123456789);
        w=1;
        if &split_ind.="&filter.";
    run;

    proc sort data=indsn;by &py _random;run;
        /*accumulate w to calculate Gini    */
        data test;
            set indsn;
            if _N_ = 1 then do;
                cumm_w0=0;
            end;
            retain cumm_w0
            ;
            cumm_w0=cumm_w0+w;
        run;

        /*calcualate Gini */
        proc sql noprint;
            create table &output
            as
            select 1-2/(sum(w)-1)*(sum(w)-sum(&y.*cumm_w0*w)/sum(&y.*w)) as gini
            from test;
        quit;


        proc print data=&output;
            title " GINI on &filter.";run;
    %mend;
nEO
  • 5,305
  • 3
  • 21
  • 25
  • SAS UE can be installed for free so you can follow the calculations. Since w=1 then cumm_w0 will be a running total of the randomly sorted data. – Reeza Oct 26 '16 at 23:56
  • @Reeza - any idea what the parameters are? – nEO Oct 27 '16 at 00:02
  • No. Where did you get this code from that you can't get documentation or help? – Reeza Oct 27 '16 at 00:41
  • there is a prediction competition at my university and they shared this with the participants – nEO Oct 27 '16 at 19:05

1 Answers1

0

This looks like an implementation of the bottom formula in this section of the wikipedia article on the Gini Coefficient:

https://en.wikipedia.org/wiki/Gini_coefficient#Alternate_expressions

user667489
  • 9,501
  • 2
  • 24
  • 35