Questions tagged [reproducible-research]

Reproducible research is the idea that the result of scientific research should be published with data and code in order to make it possible for other researchers to verify the results.

Reproducible research is the idea that the result of scientific research should be published with data and code in order to make it possible for other researchers to verify the results.

Reproducible research may be especially important to you if your investigation involves large amount of data or very complex calculations.

One possible set of tools for reproducible research is using with or .

Related links:

227 questions
4
votes
1 answer

Limiting size of hierarchical data for reproducible example

I am trying to come up with reproducible example (RE) for this question: Errors related to data frame columns during merging. To be qualified as having a RE, the question lacks only reproducible data. However, when I tried to use pretty much…
Aleksandr Blekh
  • 2,462
  • 4
  • 32
  • 64
4
votes
2 answers

R: t.test output for LaTex

I wonder if there is any function to redirect the output of t.test to LaTeX. Some thing like this library(xtable) xtable(t.test(extra ~ group, data = sleep))
MYaseen208
  • 22,666
  • 37
  • 165
  • 309
3
votes
1 answer

R: inconsistent random number generation in parallel simulation with mclapply

Problem I'm trying to implement a reproducible multicore simulation and obtain inconsistent results. Please help me explain these results and advise me of a correct way of implementing this. Note that I'm working on WSL2 (I hope there is another…
3
votes
1 answer

Is there a way to implement knitr chunk and knit options as a project-wide environment or profile for R Markdown?

I've grown tired of repeating the beginning of R Markdown documents over and over again to set up my preferences for knitting and chunk options. An example: ```{r, include=FALSE} library(tidyverse) knitr::opts_chunk$set(error = FALSE, message =…
3
votes
1 answer

Training PyTorch models on different machines leads to different results

I am training the same model on two different machines, but the trained models are not identical. I have taken the following measures to ensure reproducibility: # set random number random.seed(0) torch.cuda.manual_seed(0) np.random.seed(0) # set…
feelfree
  • 11,175
  • 20
  • 96
  • 167
3
votes
0 answers

Reproducible machine learning results on different CPUs with Intel MKL

I am working on a ML project with conda, python==3.6.8 and mkl==2019.1. I have set the seed and running the code multiple times on an Intel Pentium G4560, I get the exact same results. However, running the same code under an identical environment on…
3
votes
1 answer

Why is RNG different for TensorFlow 2 and 1?

import numpy as np np.random.seed(1) import random random.seed(2) import tensorflow as tf tf.compat.v1.set_random_seed(3) # graph-level seed if tf.__version__[0] == '2': tf.random.set_seed(4) # global seed else: tf.set_random_seed(4) #…
3
votes
0 answers

Recipe vs Formula vs X/Y Interface reproducibility for gbm with caret

I have trained the same model on the iris data set to investigate the reproducibility of each method. It seems that there is a discrepency between models when using all.equal() for the models trained with the recipes interface, but not with the…
JFG123
  • 577
  • 5
  • 13
3
votes
0 answers

Unable to reproduce result obtained from hyperparameter tuning using hyperopt

I have built a Pytorch model and performed hyperparameters tuning using library Hyperopt. The result obtained is not reproducible despite I have already call the below seeding function at the beginning of each run: util.py def…
3
votes
2 answers

colorBin() leaflet in R not working as expected

I have a data.frame with two (2) rows in it. I am trying to map this data with colors using the colorBin function from leaflet my_data <- data.frame("11772","8600000US11772","11772","41957005","1150010","Patchogue","Suffolk","195") my_data2 <-…
MCP_infiltrator
  • 3,961
  • 10
  • 45
  • 82
3
votes
2 answers

Docker in R and/or Packrat for Reproducible Science

I am not completely sure if Docker is enough for R development or I should use in in conjunction with Packrat. I have read several posts that state that docker is sufficient. The only place that support this claim is this post. However I was not…
3
votes
3 answers

Safety of xz archive format

While looking for a good option to store large amounts of data (coming mostly from numerical computations) long-term, I arrived at using xz archive format (tar.xz). The default LZMA compression there provides significantly better archive sizes (for…
Anton Menshov
  • 2,266
  • 14
  • 34
  • 55
3
votes
0 answers

How far the reproducibility can be affected by a BLAS change?

Today, I changed my BLAS to vecLib following this gist (I have a Mac) and the running time for the given test dropped from 34.6 to 5.6 seconds! However, I wonder if this could affect the reproducibility of my results. Do you have any idea? On what…
abichat
  • 2,317
  • 2
  • 21
  • 39
3
votes
1 answer

creating reproducible example using reprex package in r where a local file is being read

I often use reprex::reprex to create reproducible examples of R code to get help from others to get rid of errors in my code. Usually, I create minimal examples using datasets like iris or mtcars and it works well. But I always fail to use reprex…
Indrajeet Patil
  • 4,673
  • 2
  • 20
  • 51
3
votes
1 answer

Loops in Rmarkdown: How to make an in-text figure reference? Figure captions?

{r setup, include=FALSE, message=FALSE, results="hide"} knitr::opts_chunk$set(echo = TRUE) library(knitr) library(kfigr) library(dplyr) library(png) library(grid) library(pander) library(ggplot2) Question Loops in rmarkdown: in-text figure…
sullij
  • 196
  • 8