I was doing all this in Python, but I'm leaving the company and I want to leave behind a process that someone even less tech-y than I can manage.
I've written instructions on how to consolidate several hundred Excel worksheets using Power Query (Get & Transform Data) in Excel. I then have them 'close & load' into an Excel worksheet...the data consists of about 9000 rows with >20 columns. One column, called 'Name', = colleague name (there's about 20 unique names).
Now I need to get a random 10% of each colleague's rows. (In Python I used groupby & sample, here.) I can do this in the Excel worksheet by adding a 'helper row' called 'Rand No' that contains the formula =RAND()
, and then using this array formula (Control+Shift+Enter): =IF([@[Rand No]]>=LARGE((--([@Name]=[Name])*[Rand No]),COUNTIF([Name],[@Name])/10),"Randomised","")
.
But I was just wondering how you could do this randomisation in PowerQuery, since we already had the PQ editor open when we consolidated all those sheets. I'm guessing it would be more complicated than just doing it with an ordinary Excel formula--but am I wrong?