Questions tagged [dataset]

A dataset is a collection of data, generally represented in tabular form, with columns signifying different variables and rows signify different members of the set. If you are looking for a freely available dataset for any purpose, please consider asking your question on https://opendata.stackexchange.com.

11414 questions
16
votes
4 answers

JMeter CSV Dataset Config: how to move through variables in the same thread?

I'm using a CSV dataset config element, which is reading from a file like this: abd sds ase sdd ssd cvv Which, basically, has a number of 3 letter random string. I'm assigning them to a variable called ${random_3}. Now, I want to use values from…
Ashkan Aryan
  • 3,504
  • 4
  • 30
  • 44
16
votes
3 answers

R How to read a file from google drive using R

I would like to read in R a dataset from google drive as the screenshot indicated. Neither url <- "https://drive.google.com/file/d/1AiZda_1-2nwrxI8fLD0Y6e5rTg7aocv0" temp <- tempfile() download.file(url, temp) bank <- read.table(unz(temp,…
seven
  • 173
  • 1
  • 2
  • 8
16
votes
5 answers

Spark: get number of cluster cores programmatically

I run my spark application in yarn cluster. In my code I use number available cores of queue for creating partitions on my dataset: Dataset ds = ... ds.coalesce(config.getNumberOfCores()); My question: how can I get number available cores of queue…
Rougher
  • 834
  • 5
  • 19
  • 46
16
votes
1 answer

Does the dataset size influence a machine learning algorithm?

So, imagine having access to sufficient data (millions of datapoints for training and testing) of sufficient quality. Please ignore concept drift for now and assume the data static and does not change over time. Does it even make sense to use all of…
user3354890
  • 367
  • 1
  • 3
  • 10
16
votes
3 answers

Good dataset for sentiment analysis?

I am working on sentiment analysis and I am using dataset given in this link: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html and I have divided my dataset into 50:50 ratio. 50% are used as test samples and 50% are used as train…
user3512562
  • 233
  • 2
  • 3
  • 7
16
votes
2 answers

How does glmnet's standardize argument handle dummy variables?

In my dataset I have a number of continuous and dummy variables. For analysis with glmnet, I want the continuous variables to be standardized but not the dummy variables. I currently do this manually by first defining a dummy vector of columns that…
Dr. Beeblebrox
  • 838
  • 2
  • 13
  • 30
15
votes
4 answers

How to obtain filenames during prediction while using tf.keras.preprocessing.image_dataset_from_directory()?

Keras introduced tf.keras.preprocessing.image_dataset_from_directory function recently, which is more efficient than previously ImageDataGenerator.flow_from_directory method in tensorflow 2.x. I am practising on the catsvsdogs problems and using…
J.Kim
  • 151
  • 1
  • 3
15
votes
3 answers

Pytorch - Concatenating Datasets before using Dataloader

I am trying to load two datasets and use them both for training. Package versions: python 3.7; pytorch 1.3.1 It is possible to create data_loaders seperately and train on them sequentially: from torch.utils.data import DataLoader,…
chrispduck
  • 333
  • 1
  • 3
  • 9
15
votes
5 answers

Read and reverse data chunk by chunk from a csv file and copy to a new csv file

Assume I'm dealing with a very large csv file. So, I can only read the data chunk by chunk into the memory. The expected flow of events should be as follows: 1) Read chunk (eg: 10 rows) of data from csv using pandas. 2) Reverse the order of data 3)…
Suleka_28
  • 2,761
  • 4
  • 27
  • 43
15
votes
1 answer

tensorflow Dataset API diff between make_initializable_iterator and make_one_shot_iterator

I want to know the difference between make_initializable_iterator and make_one_shot_iterator. 1. Tensorflow documentations said that A "one-shot" iterator does not currently support re-initialization. What exactly does that mean? 2. Are the…
Lion Lai
  • 1,862
  • 2
  • 20
  • 41
15
votes
2 answers

What is StringIndexer , VectorIndexer, and how to use them?

Dataset dataFrame = ... ; StringIndexerModel labelIndexer = new StringIndexer() .setInputCol("label") .setOutputCol("indexedLabel") .fit(dataFrame); VectorIndexerModel featureIndexer = new…
15
votes
4 answers

Twitter (Social networking) Dataset

I am looking for twitter or other social networking sites dataset for my project. I currently have the CAW 2.0 twitter dataset but it only contains tweets of users. I want a data that shows the number of friends, follower and such. It does not have…
denniss
  • 17,229
  • 26
  • 92
  • 141
15
votes
3 answers

How to perform under sampling in scikit learn?

We have a retinal dataset wherein the diseased eye information constitutes 70 percent of the information whereas the non diseased eye constitutes the remaining 30 percent.We want a dataset wherein the diseased as well as the non diseased samples…
Gaurav Patil
  • 483
  • 2
  • 5
  • 10
15
votes
4 answers

How to put datasets into an R package

I am creating my own R package and I was wondering what are the possible methods that I can use to add (time-series) datasets to my package. Here are the specifics: I have created a package subdirectory called data and I am aware that this is the…
Graeme Walsh
  • 638
  • 7
  • 20
15
votes
4 answers

adding a datatable in a dataset

I'm adding a datatable to a dataset like this: DataTable dtImage = new DataTable(); //some updates in the Datatable ds.Tables.Add(dtImage); But the next time, when the datatable gets updated, will it be reflected in the dataset? or we need to write…
Manikandan Sigamani
  • 1,964
  • 1
  • 15
  • 28