Questions tagged [reproducible-research]

Reproducible research is the idea that the result of scientific research should be published with data and code in order to make it possible for other researchers to verify the results.

Reproducible research is the idea that the result of scientific research should be published with data and code in order to make it possible for other researchers to verify the results.

Reproducible research may be especially important to you if your investigation involves large amount of data or very complex calculations.

One possible set of tools for reproducible research is using with or .

Related links:

227 questions
11
votes
1 answer

automated text for reproducible research

I am using RStudio, R Markdown, Latex, and Pandoc to clean data, construct variables, run my analysis, and report the results. I'm new to the concept of reproducible research, but I'm hooked. Makes a lot of sense. Dynamic tables and figures are no…
Eric Green
  • 7,385
  • 11
  • 56
  • 102
10
votes
0 answers

Markdown for Reproducible Research in Python

I would like to know whether there is something equivalent to R-markdown in Python which can help me do reproducible research. Please note: I'm not interested in IPython Notebooks as an answer. I want to have the syntactic joy of r-markdown with…
Naimish Agarwal
  • 516
  • 5
  • 14
8
votes
2 answers

Parallel processing in R - setting seed with mclapply() vs. pbmclapply()

I'm parallelizing simulations in R (using mclapply() from the parallel package) and wanted to track my progress with each function call. So I instead decided to use pbmclapply() from the pbmcapply package in order to have a progress bar each time I…
8
votes
2 answers

How do I assign a random seed to the dplyr sample_n function?

This is the "sample_n" from dplyr in R. https://dplyr.tidyverse.org/reference/sample.html For reproducibility, I should place a seed so that someone else can get my exact results. Is there a built-in way to set the seed for "sample_n"? Is this…
EngrStudent
  • 1,924
  • 31
  • 46
8
votes
1 answer

Set random seed for matplotlib plotting backend

I am generating and saving SVG images using matplotlib and would like to make them as reproducible as possible. However, even after setting np.random.seed and random.seed, the various id and xlink:href values in the SVG images still change between…
saladi
  • 3,103
  • 6
  • 36
  • 61
8
votes
1 answer

Trouble with Pandoc installation on Ubuntu 14.04LTS for using with R Markdown

This question is a corollary of my attempts to get some experience with creating reproducible reports from R Markdown documents via knitr and rmarkdown R packages. While it seems that .Rmd => HTML conversion is automated from within RStudio (Knit…
Aleksandr Blekh
  • 2,462
  • 4
  • 32
  • 64
7
votes
2 answers

Can I write identical xlsx files from the same data frame in R?

Can I make sure that two XLSX files (written with openxlsx::write.xlsx) are identical, when given the same data to write? I think there's a timestamp written to the spreadsheet which means the same data written more than one second apart creates a…
Spacedman
  • 92,590
  • 12
  • 140
  • 224
7
votes
1 answer

What does the difference between 'torch.backends.cudnn.deterministic=True' and 'torch.set_deterministic(True)'?

My network includes 'torch.nn.MaxPool3d' which throw a RuntimeError when cudnn deterministic flag is on according to the PyTorch docs (version 1.7 - https://pytorch.org/docs/stable/generated/torch.set_deterministic.html#torch.set_deterministic),…
chungseok
  • 73
  • 1
  • 5
7
votes
1 answer

create references in each section in Rmarkdown

I want to use Rmarkdown but what I've read is that when creating a bibliography using pandoc, references go at the end of the document: pandoc/citeproc issues: multiple bibliographies, nocite, citeonly So even if I have a parent document named…
6
votes
1 answer

Python sklearn RandomForestClassifier non-reproducible results

I've been using sklearn's random forest, and I've tried to compare several models. Then I noticed that random-forest is giving different results even with the same seed. I tried it both ways: random.seed(1234) as well as use random forest built-in…
6
votes
1 answer

Package for formatting numeric values in reproducible research

Is there a standard way of converting numeric values to character with a particular type of formatting applied. I'm thinking of something like: formatR(32390,"dollars") # returns "$32,390" formatR(1.25,"percent") # returns "125%" Obviously, not so…
Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
5
votes
2 answers

Does setting the seed in tf.random.set_seed also set the seed used by the glorot_uniform kernel_initializer when using a conv2D layer in keras?

I'm currently training a convolutional neural network using a conv2D layer defined like this: conv1 = tf.keras.layers.Conv2D(filters=64, kernel_size=(3,3), padding='SAME', activation='relu')(inputs) My understanding is that the default…
5
votes
4 answers

If Keras results are not reproducible, what's the best practice for comparing models and choosing hyper parameters?

UPDATE: This question was for Tensorflow 1.x. I upgraded to 2.0 and (at least on the simple code below) the reproducibility issue seems fixed on 2.0. So that solves my problem; but I'm still curious about what "best practices" were used for this…
user2543623
  • 1,452
  • 2
  • 15
  • 24
5
votes
1 answer

Problem to reproduce results from parallelSVM in R

I am not able to set a seed value to get reproducible results from parallelSVM(). library(e1071) library(parallelSVM) data(iris) x <- subset(iris, select = -Species) y <- iris$Species set.seed(1) model <- parallelSVM(x,…
5
votes
3 answers

Tensorflow-Keras reproducibility problem on Google Colab

I have a simple code to run on Google Colab (I use CPU mode): import numpy as np import pandas as pd ## LOAD DATASET datatrain = pd.read_csv("gdrive/My Drive/iris_train.csv").values xtrain = datatrain[:,:-1] ytrain = datatrain[:,-1] datatest =…
malioboro
  • 3,097
  • 4
  • 35
  • 55
1
2
3
15 16