0

I am facing a big issue in computing the cumsum() of a vector. The vector has a length of ~ 10,000 elements and from element say 2000 the values go down to 1e-310. To give a feeling of the distribution I am dealing with, here is a plot.

sorted distribution

When I try to apply cumsum() I get lots of ones, which is impossible, and a minimum value around 10^-2. I am porting a code which we developed in Matlab and of course no problems there. For some reason, R seems to have troubles in working with such small numbers to the extent that applying standard functions returns unexpected, and wrong, results.

I searched over stack overflow and found these two posts:

  1. R: Number precision, how to prevent rounding?
  2. Controlling number of decimal digits in print output in R

Unfortunately, none of them helped me out.

I also tried to use Rcpp cumsum() function with no luck. I guess the problem comes directly from the precision of my matrix object.

I am not even sure how to reproduce this so I am happy to share my 9137 x 2 matrix. I am completely stuck with this.

Looking forward to hearing from you guys!
Thanks

Update

Here is a sample of 100 elements from my matrix:

y <- sample( BestPair, 100 )
dput( y )

c(7.74958917761984e-289, 4.19283869686715e-319, 1.52834266884531e-319, 
2.22089175309335e-297, 4.93980517192742e-298, 1.37861543592719e-301, 
1.47044459800611e-317, 6.49068860911021e-319, 1.83302927898675e-305, 
8.39514422452147e-312, 2.88487711616485e-300, 0.000544461085044608, 
0.000435738736513519, 1.35649914843994e-309, 4.30826678309556e-310, 
2.60728322623343e-319, 0.000544460617547516, 5.28815204888643e-299, 
0.00102710912090133, 0.00198425117943324, 1.99711912768841e-304, 
8.34594499227505e-306, 7.42055412763084e-300, 5.00039717762739e-311, 
1.8750204972032e-305, 1.06513324565406e-310, 5.00487443690634e-313, 
3.4890421843663e-319, 7.48945537292364e-310, 1.92948452007191e-310, 
1.19840058299897e-305, 0.000532438536688165, 6.53966533658245e-318, 
0.000499821676385928, 2.02305525482572e-305, 5.18981575493413e-306, 
8.82648276295387e-320, 7.30476057376283e-320, 1.23073492422415e-291, 
4.1801705284367e-311, 5.10863383734203e-318, 1.12106998189819e-298, 
9.34823978505262e-297, 0.00093615863896107, 5.3667092510628e-311, 
3.85094526994501e-319, 1.3693720559483e-313, 3.96230943126873e-311, 
2.03293191294298e-319, 2.38607510351427e-291, 0.000544460855322345, 
1.74738584846597e-310, 1.41874408662835e-318, 5.73056861298345e-319, 
3.28565325597139e-302, 3.5412805275117e-310, 1.19647007227024e-302, 
1.71539915106223e-305, 2.10738303243284e-311, 6.47783846432427e-313, 
5.0072402480979e-303, 7.7250380240544e-303, 9.75451890703679e-309, 
0.000533945755492525, 0.00211359631486468, 1.6612179399628e-312, 
0.000521804571338402, 4.12194185271951e-308, 1.12829365794294e-313, 
8.89772702908418e-319, 5.092756929242e-312, 7.45208240537024e-311, 
6.60385177095196e-298, 0.000544461017733648, 1.62108867188263e-318, 
3.95135528339003e-309, 1.8792966379072e-292, 5.98494480819088e-295, 
0.00051614492665081, 2.25198141886419e-300, 7.97467977809552e-305, 
1.78098757558338e-311, 1.66946525895122e-313, 0.000778442249425894, 
6.58100207570114e-312, 0.00120733768329515, 3.44432924341767e-319, 
6.38151190880713e-313, 7.1129669216109e-300, 4.11319531475755e-319, 
7.21747577033383e-304, 1.48709312807403e-318, 1.39519898470211e-303, 
4.58585270141592e-312, 2.16309869205282e-295, 7.55248601743105e-314, 
3.16365476892733e-310, 1.96961507010996e-305, 3.21125377499206e-318, 
3.66277772043574e-304)

Update 2

Apparently, imposing the following:

BestPair[ BestPair < .Machine$double.eps ] <- 0

does not solve the issue. Still finding weird results from cumsum(). Here is a plot to better explain what I am dealing with. The Cumulative Prob. has this shape because BestPair has been sorted by decreasing order. I want to have the 1 from cumsum() on top of my vector.

enter image description here

Here is a summary of the ob

> summary(CumProb)
## Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.0250  1.0000  1.0000  0.9685  1.0000  1.0000 

Update 3. Results from Matlab

Here is the result as computed with Matlab. As you can see, I can get a pretty decent distribution which I cannot achieve in R even if I truncated the original matrix.

enter image description here

Francesco Grossetti
  • 1,555
  • 9
  • 17
  • Can you sample 100 or so values from your vector as `y<-sample(x,100)` where x is your matrix, do `dput(y)` and add the output to your post? – Rohit Mar 20 '19 at 08:12
  • @Rohit there you go! Thanks – Francesco Grossetti Mar 20 '19 at 08:15
  • I would try [Rmpfr](https://cran.r-project.org/web/packages/Rmpfr/vignettes/Rmpfr-pkg.pdf) – Stéphane Laurent Mar 20 '19 at 08:47
  • I think there are two issues: Your first ~ 1000 values are in the range 0.0001~0.025 that may coincidentally add upto 1. The rest of your values are in the range 1e-300 and their total sum would not evn get close to 1e-250. Do you want to see the 250-300th decimal places changing? – Rohit Mar 20 '19 at 09:14
  • Actually yes. I am attaching the result I get from Matlab. If I truncated BestPair to avoid going down with such small number, the cumulative distribution will have very different shape from that one computed in Matlab. – Francesco Grossetti Mar 20 '19 at 09:33
  • If you sort your data in ascending order before applying `cumsum` you should be able to get the shape you want. – Rohit Mar 21 '19 at 08:40

0 Answers0