I am building an index of user-qualities defined as a sum of (often) correlated continious variables representing user-activity. The index is well-calibrated, and servs the purpos of my analysis, but is tricky to communicate to my co-workers, particularly, since outlier activities cause extremely tenatious users to score a very highly on the activity index.
For 97% of users, the index is distributed near-normally between 0 and 100, with a right tail of 3% of hyper-active users with an index > 100. Index-values beyond 200 should be extremely rare but are theoretically possible.
I'm looking to scale the tail back into a 0-100 span, but not linearly, since I would like the 3%-tail to be represented as small variances within the top-range of the 0-100 index. What I'm looking for a non-linear formula to scale my index, like this:
so that the lower tier of the unscaled index remains close to the scaled one, but where high index-values diverge, but where scaled values never reach 100 as my index goes towards infinity, so that x=0=f(x) but when x = 140, f(x) ≈ 99 or something similar
I'll implement the scaling in R, Python and BigQuery.