1

In Google Sheets, I've been working on data sets of varying prices to compare them to actual costs and excluding values outside of the normal ranges. I've had help previously, but I am unable to design more formulas based on the original formulas I've had help with.

In my data sets, I have very large expanding data sets I've typed manually where multiple numbers occur over again. I've notated each value with multiple values as value (x) where as the X represents the quantity or total amount of occurrences within the data set to make it smaller; as I need to type each manually

The formula to calculate the Average is below:

=ARRAYFORMULA(SUM(QUERY(SPLIT(TRANSPOSE(SPLIT(QUERY(TRANSPOSE(QUERY(TRANSPOSE(
 IF(ISNUMBER(A19:AP19), "♦"&A19:AP19&" 1", IF((A19:AP19<>"")*(NOT(ISNUMBER(A19:AP19))), "♦"&REGEXREPLACE(A19:AP19, "[()]", ), ))),,99^99)),,99^99), "♦")), " "), 
 "select Col1*Col2 label Col1*Col2 ''", 0))/SUM(QUERY(SPLIT(TRANSPOSE(SPLIT(QUERY(TRANSPOSE(QUERY(TRANSPOSE(
 IF(ISNUMBER(A19:AP19), "♦"&A19:AP19&" 1", IF((A19:AP19<>"")*(NOT(ISNUMBER(A19:AP19))), "♦"&REGEXREPLACE(A19:AP19, "[()]", ), ))),,99^99)),,99^99), "♦")), " "), 
 "select Col2", 0)))

This works extremely well. I've also have a formula that calculates the Min and Max of the data set.

I've attempted to edit the code to give myself for mode (which will not read value (x) values), and IQR (Interquartile Range) with absolutely no success.

I want to be able to find the Standard Deviation, Mean Deviation, and IQR for datasets while using values that are notated using values (x)

Below is a link of a sample of a data set: https://docs.google.com/spreadsheets/d/15-UY7salQ8fI011twnIKSjEJQWCSVmfV1wErE28gvKw/edit?usp=sharing

Kyle Monti
  • 704
  • 2
  • 10
  • 21
  • Issue: Formulas do not work when Range is larger than data set. Formulas show incorrect numbers when data is split to fill part of the empty area. https://docs.google.com/spreadsheets/d/1byR6Aw3lTImxNHDynTm5S1MWjpxmCuMyNl5WT9PcC1E/edit#gid=0 – Kyle Monti Mar 08 '20 at 23:35
  • I've spliced each formula to split each column before calculating Median, Deviations. `=ARRAYFORMULA(MEDIAN(TRANSPOSE(SPLIT(QUERY(REPT( INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B)...` For each column: `TRANSPOSE(SPLIT(QUERY(REPT( INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B)..., TRANSPOSE(SPLIT(QUERY(REPT( INDEX(IF(C14:C="",,SPLIT(IF(ISNUMBER(C14:C)` Unsure why IQR will not follow the same process as this. – Kyle Monti Mar 09 '20 at 18:56

2 Answers2

1

Solution

Google Sheets has custom formulas for these statistical varibales. Use these as follow:

I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)

Mateo Randwolf
  • 2,823
  • 1
  • 6
  • 17
1

base formula (which takes your range B14:B and expands all values which are in format value (x))

=ARRAYFORMULA(TRANSPOSE(SPLIT(QUERY(REPT(
 INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B), B14:B&" 1", B14:B), " ()"))*1,,1)&"♦", 
 INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B), B14:B&" 1", B14:B), " ()"))*1,,2)),,999^99), "♦")))

0


AVERAGE (B3)

=ARRAYFORMULA(SUM(QUERY(SPLIT(TRANSPOSE(SPLIT(QUERY(TRANSPOSE(QUERY(TRANSPOSE(
 IF(ISNUMBER(B14:B), "♦"&B14:B&" 1", IF((B14:B<>"")*(NOT(ISNUMBER(B14:B))), 
 "♦"&REGEXREPLACE(B14:B, "[()]", ), ))),,99^99)),,99^99), "♦")), " "), 
 "select Col1*Col2 label Col1*Col2 ''", 0))/
 SUM(QUERY(SPLIT(TRANSPOSE(SPLIT(QUERY(TRANSPOSE(QUERY(TRANSPOSE(
 IF(ISNUMBER(B14:B), "♦"&B14:B&" 1", IF((B14:B<>"")*(NOT(ISNUMBER(B14:B))), 
 "♦"&REGEXREPLACE(B14:B, "[()]", ), ))),,99^99)),,99^99), "♦")), " "), 
 "select Col2", 0)))

RANGE (B4)

=B7-B8

MEDIAN (B5)

=ARRAYFORMULA(MEDIAN(TRANSPOSE(SPLIT(QUERY(REPT(
 INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B), B14:B&" 1", B14:B), " ()"))*1,,1)&"♦", 
 INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B), B14:B&" 1", B14:B), " ()"))*1,,2)),,999^99), "♦"))))

MODE (B6)

=ARRAYFORMULA(MODE(TRANSPOSE(SPLIT(QUERY(REPT(
 INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B), B14:B&" 1", B14:B), " ()"))*1,,1)&"♦", 
 INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B), B14:B&" 1", B14:B), " ()"))*1,,2)),,999^99), "♦"))))

MAX (B7)

=ARRAYFORMULA(MAX(QUERY(TRANSPOSE(SPLIT(QUERY(TRANSPOSE(QUERY(TRANSPOSE(B14:B)
,,99^99)),,99^99), " ")), "where not Col1 contains '-'", 0)))

MIN (B8)

=ARRAYFORMULA(MIN(QUERY(TRANSPOSE(SPLIT(QUERY(TRANSPOSE(QUERY(TRANSPOSE(B14:B)
,,99^99)),,99^99), " ")), "where not Col1 contains '-'", 0)))

IQR - INTERQUARTILE RANGE (B9)

=ARRAYFORMULA(QUARTILE(TRANSPOSE(SPLIT(QUERY(REPT(
 INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B), B14:B&" 1", B14:B), " ()"))*1,,1)&"♦", 
 INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B), B14:B&" 1", B14:B), " ()"))*1,,2)),,999^99), "♦")), 3)-
 QUARTILE(TRANSPOSE(SPLIT(QUERY(REPT(
 INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B), B14:B&" 1", B14:B), " ()"))*1,,1)&"♦", 
 INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B), B14:B&" 1", B14:B), " ()"))*1,,2)),,999^99), "♦")), 1))

MEAN DEVIATION (B10)

=ARRAYFORMULA(AVEDEV(TRANSPOSE(SPLIT(QUERY(REPT(
 INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B), B14:B&" 1", B14:B), " ()"))*1,,1)&"♦", 
 INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B), B14:B&" 1", B14:B), " ()"))*1,,2)),,999^99), "♦"))))

STANDARD DEVIATION (B11)

=ARRAYFORMULA(STDEV(TRANSPOSE(SPLIT(QUERY(REPT(
 INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B), B14:B&" 1", B14:B), " ()"))*1,,1)&"♦", 
 INDEX(IF(B14:B="",,SPLIT(IF(ISNUMBER(B14:B), B14:B&" 1", B14:B), " ()"))*1,,2)),,999^99), "♦"))))
Community
  • 1
  • 1
player0
  • 124,011
  • 12
  • 67
  • 124
  • Is works well, I have noticed for some data is will cause #value errors when data duplicates are over a certain amount, for instance 9094 causes an error and works if duplicated or reduced below 1000. – Kyle Monti Feb 28 '20 at 00:42
  • 1
    That's perfectly fine, thank you so much for looking into it. – Kyle Monti Feb 28 '20 at 17:10
  • I've used the formulas in the data, but I found if I use 1 data set with blank areas for future data sets. Median, Mod, IQR, Mean Dev, StrDev have errors. I believe the the problem is with the cell being blank or not. Can this answer be revised to work with larger areas that are blank? Also, numbers are not accurate when spread across a data sheet instead of 1 column. See the spreadsheet linked here: https://docs.google.com/spreadsheets/d/1byR6Aw3lTImxNHDynTm5S1MWjpxmCuMyNl5WT9PcC1E/edit?usp=sharing – Kyle Monti Mar 08 '20 at 23:26
  • Check comments in my question. I think I have it usable...but IQR still have issues – Kyle Monti Mar 09 '20 at 19:01