Weighted average of pair elements in a vector in R

Question

I have two vectors x and w. vector w is a numerical vector of weights the same length as x. How can we get the weighted average of the fisrt pair elements in vector x which their difference are small (for example tol= 1e-2) and then in the next iteration do the same thing for the next pair until there is no pair which their difference is less than tol? For example, these vectors are as follows:

     x = c(0.0001560653, 0.0001591889, 0.0001599698, 0.0001607507, 0.0001623125,
           0.0001685597, 0.0002793819, 0.0006336307, 0.0092017241, 0.0092079042,
           0.0266525118, 0.0266889564, 0.0454923285, 0.0455676525, 0.0457005450)
     w = c(2.886814e+03, 1.565955e+04, 9.255762e-02, 7.353589e+02, 1.568933e+03,
           5.108046e+05, 6.942338e+05, 4.912165e+04, 9.257674e+00, 3.609918e+02,
           8.090436e-01, 1.072975e+00, 1.359145e+00, 9.828314e+00, 9.455688e+01)

I want to find which pair elements of x has the minimum differences and after finding this pair , get the weighted average mean. I try this code but this one does not give me the result. How can i find the index of min(diff(x)) and check that is it less than tol or not?

        > min(diff(x))
        > which(min(diff(x)) < 1e-2)

The details of how you want a difference of a weighted average of a pair escape me. Can you provide an example of what the calculation for the first pair looks like? — Aaron left Stack Overflow, Sep 14 '12 at 01:42
At the moment this is mathematically incoherent when expressed in natural language. If there is a language barrier (and I admit that English is the least sensible choice for international communication) , then the way to surmount the problem is to use appropriate combinations of mathematical notation. — IRTFM, Sep 14 '12 at 01:46
In each iteration, I am looking for the first pair of x which their difference are small(1e-2). if we could find this pair then get the weighted mean of this pair. — Bensor Beny, Sep 14 '12 at 01:49
OK , but...What is supposed to happen between each pause to decide which neo-values are to be chosen? — IRTFM, Sep 14 '12 at 04:05

Matthew Plourde · Answer 1 · 2012-09-14T02:41:02.323

1

It would be mighty helpful if you described what calculating your result by hand would look like with the sample data you provided. I can't say I'm completely sure I know what you want, but here's a stab in the dimly lit:

tol = 1e-2
sapply(which(diff(x) < tol), 
       function(i) x[i:(i+1)] %*% w[i:(i+1)] / sum(w[i:(i+1)]))

edited Sep 14 '12 at 02:41

answered Sep 14 '12 at 02:20

Matthew Plourde

43,932
7
96
113

flodel · Answer 2 · 2012-09-14T03:26:55.200

First, you can cluster your data and cut it with respect to a maximum distance between clusters:

hc <- hclust(dist(x))
ct <- cutree(hc, h = 1e-2)
ct
# [1] 1 1 1 1 1 1 1 1 1 1 2 2 3 3 3

Then, split your x and w according to the clustered groups:

x.groups <- split(x, ct)
x.groups
# $`1`
#  [1] 0.0001560653 0.0001591889 0.0001599698 0.0001607507 0.0001623125
#  [6] 0.0001685597 0.0002793819 0.0006336307 0.0092017241 0.0092079042
# 
# $`2`
# [1] 0.02665251 0.02668896
# 
# $`3`
# [1] 0.04549233 0.04556765 0.04570055

w.groups <- split(w, ct)
w.groups
# $`1`
#  [1] 2.886814e+03 1.565955e+04 9.255762e-02 7.353589e+02 1.568933e+03
#  [6] 5.108046e+05 6.942338e+05 4.912165e+04 9.257674e+00 3.609918e+02
# 
# $`2`
# [1] 0.8090436 1.0729750
# 
# $`3`
# [1]  1.359145  9.828314 94.556880

Finally, you can use mapply to compute the weighted averages across groups:

mapply(function(x, w) sum(x * w) / sum(w), x.groups, w.groups)
#           1           2           3 
# 0.000249265 0.026673290 0.045685517

Edit: So it is now clear that you want your clusters to have at most two elements. There may be clustering algorithms that meet that requirement but you can easily do it yourself, with a loop. Here is a rough version:

d <- as.matrix(dist(x))
d[upper.tri(d, diag = TRUE)] <- Inf
d[d > 1e-2] <- Inf

while(any(is.finite(d))) {
   min.d <- which.min(d)
   idx   <- c(col(d)[min.d], row(d)[min.d])
   wavg  <- sum(x[idx] * w[idx]) / sum(w[idx])
   print(paste("idx", idx[1], "and", idx[2], "with wavg=", wavg))
   d[idx, ] <- Inf
   d[, idx] <- Inf
}
# [1] "idx 2 and 3 with wavg= 0.000159188904615574"
# [1] "idx 4 and 5 with wavg= 0.000161814089390641"
# [1] "idx 9 and 10 with wavg= 0.0092077496735115"
# [1] "idx 1 and 6 with wavg= 0.000168489484676445"
# [1] "idx 11 and 12 with wavg= 0.026673289567385"
# [1] "idx 13 and 14 with wavg= 0.0455585015178172"
# [1] "idx 7 and 8 with wavg= 0.00030279100471097"

(I'll leave it to you to modify it so you can store the outputs as you wish.)

Thanks for you reply, but actually I want to get just one pair elements which their difference is small in each iteration, not all similar value in each cluster. — Bensor Beny, Sep 14 '12 at 02:08

score 0 · Answer 3 · answered Sep 14 '12 at 02:56

I'm a bit confused about what you want as well, but the below code will find the values of x which have only increased by a minimum amount or less (1e-2) from the previous value (see ?diff) and then return a weighted value for these values only:

smallpair <- which(c(NA,diff(x)) < 1e-2)
x[smallpair]*w[smallpair]

Weighted average of pair elements in a vector in R

3 Answers3