A dataset is a collection of data, generally represented in tabular form, with columns signifying different variables and rows signify different members of the set. If you are looking for a freely available dataset for any purpose, please consider asking your question on https://opendata.stackexchange.com.
Questions tagged [dataset]
11414 questions
16
votes
4 answers
JMeter CSV Dataset Config: how to move through variables in the same thread?
I'm using a CSV dataset config element, which is reading from a file like this:
abd
sds
ase
sdd
ssd
cvv
Which, basically, has a number of 3 letter random string.
I'm assigning them to a variable called ${random_3}.
Now, I want to use values from…

Ashkan Aryan
- 3,504
- 4
- 30
- 44
16
votes
3 answers
R How to read a file from google drive using R
I would like to read in R a dataset from google drive as the
screenshot indicated.
Neither
url <- "https://drive.google.com/file/d/1AiZda_1-2nwrxI8fLD0Y6e5rTg7aocv0"
temp <- tempfile()
download.file(url, temp)
bank <- read.table(unz(temp,…

seven
- 173
- 1
- 2
- 8
16
votes
5 answers
Spark: get number of cluster cores programmatically
I run my spark application in yarn cluster. In my code I use number available cores of queue for creating partitions on my dataset:
Dataset ds = ...
ds.coalesce(config.getNumberOfCores());
My question: how can I get number available cores of queue…

Rougher
- 834
- 5
- 19
- 46
16
votes
1 answer
Does the dataset size influence a machine learning algorithm?
So, imagine having access to sufficient data (millions of datapoints for training and testing) of sufficient quality. Please ignore concept drift for now and assume the data static and does not change over time. Does it even make sense to use all of…

user3354890
- 367
- 1
- 3
- 10
16
votes
3 answers
Good dataset for sentiment analysis?
I am working on sentiment analysis and I am using dataset given in this link: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html and I have divided my dataset into 50:50 ratio. 50% are used as test samples and 50% are used as train…

user3512562
- 233
- 2
- 3
- 7
16
votes
2 answers
How does glmnet's standardize argument handle dummy variables?
In my dataset I have a number of continuous and dummy variables. For analysis with glmnet, I want the continuous variables to be standardized but not the dummy variables.
I currently do this manually by first defining a dummy vector of columns that…

Dr. Beeblebrox
- 838
- 2
- 13
- 30
15
votes
4 answers
How to obtain filenames during prediction while using tf.keras.preprocessing.image_dataset_from_directory()?
Keras introduced tf.keras.preprocessing.image_dataset_from_directory function recently, which is more efficient than previously ImageDataGenerator.flow_from_directory method in tensorflow 2.x.
I am practising on the catsvsdogs problems and using…

J.Kim
- 151
- 1
- 3
15
votes
3 answers
Pytorch - Concatenating Datasets before using Dataloader
I am trying to load two datasets and use them both for training.
Package versions: python 3.7;
pytorch 1.3.1
It is possible to create data_loaders seperately and train on them sequentially:
from torch.utils.data import DataLoader,…

chrispduck
- 333
- 1
- 3
- 9
15
votes
5 answers
Read and reverse data chunk by chunk from a csv file and copy to a new csv file
Assume I'm dealing with a very large csv file. So, I can only read the data chunk by chunk into the memory. The expected flow of events should be as follows:
1) Read chunk (eg: 10 rows) of data from csv using pandas.
2) Reverse the order of data
3)…

Suleka_28
- 2,761
- 4
- 27
- 43
15
votes
1 answer
tensorflow Dataset API diff between make_initializable_iterator and make_one_shot_iterator
I want to know the difference between make_initializable_iterator and make_one_shot_iterator.
1. Tensorflow documentations said that A "one-shot" iterator does not currently support re-initialization. What exactly does that mean?
2. Are the…

Lion Lai
- 1,862
- 2
- 20
- 41
15
votes
2 answers
What is StringIndexer , VectorIndexer, and how to use them?
Dataset dataFrame = ... ;
StringIndexerModel labelIndexer = new StringIndexer()
.setInputCol("label")
.setOutputCol("indexedLabel")
.fit(dataFrame);
VectorIndexerModel featureIndexer = new…

Manikandan Balasubramanian
- 1,079
- 4
- 14
- 27
15
votes
4 answers
Twitter (Social networking) Dataset
I am looking for twitter or other social networking sites dataset for my project. I currently have the CAW 2.0 twitter dataset but it only contains tweets of users. I want a data that shows the number of friends, follower and such.
It does not have…

denniss
- 17,229
- 26
- 92
- 141
15
votes
3 answers
How to perform under sampling in scikit learn?
We have a retinal dataset wherein the diseased eye information constitutes 70 percent of the information whereas the non diseased eye constitutes the remaining 30 percent.We want a dataset wherein the diseased as well as the non diseased samples…

Gaurav Patil
- 483
- 2
- 5
- 10
15
votes
4 answers
How to put datasets into an R package
I am creating my own R package and I was wondering what are the possible methods that I can use to add (time-series) datasets to my package. Here are the specifics:
I have created a package subdirectory called data and I am aware that this is the…

Graeme Walsh
- 638
- 7
- 20
15
votes
4 answers
adding a datatable in a dataset
I'm adding a datatable to a dataset like this:
DataTable dtImage = new DataTable();
//some updates in the Datatable
ds.Tables.Add(dtImage);
But the next time, when the datatable gets updated, will it be reflected in the dataset? or we need to write…

Manikandan Sigamani
- 1,964
- 1
- 15
- 28