Highest Voted 'petastorm' Questions

4

votes

0 answers

ValueError: Items of feature_columns must be a _FeatureColumn. (Tensorflow 1.13)

I'm running into a ValueError when running Tensorflow-1.13 + Horovod-0.16 + Spark-0.24 + Petastorm-0.17. It's a straightforward implementation of a model_fn and some indicator_columns, but is throwing an error similar to Items of feature_columns…

asked May 16 '19 at 21:52

Gan

41
2

3

votes

0 answers

What is the best way to feed training data from parquet file to a Tensorflow/Keras model?

I have a training dataset stored on S3 in parquet format. I wish to load this data into a notebook (on databricks cluster) and train a Keras model on it. There are few ways that I can think of to train Keras model on this dataset: read parquet file…

tensorflow amazon-s3 parquet tensorflow-datasets petastorm

asked Nov 30 '21 at 06:29

exAres

4,806
16
53
95

3

votes

0 answers

Should I create a PyTorch Dataset to train a model off a pyspark dataframe?

I want to train a PyTorch NLP model over training data in columnar format, and I thought to construct a PyTorch Dataset using as raw data a pyspark dataframe (not sure it's the right approach...). To preprocess text I'm using a tokenizer provided by…

python pyspark pytorch huggingface-tokenizers petastorm

asked Feb 10 '21 at 14:34

Davide Fiocco

5,350
5
35
72

2

votes

2 answers

How to print out data that goes to keras model.fit , specifically if using petastorm dataset

Update While I appreciated AloneTogether's answer, I didn't like that I was using take() and it was separate from model.fit. I put another answer here if you want to look at it. It involves subclassing Model. It's not too bad. End of Update I have…

python tensorflow keras callback petastorm

asked Jan 18 '22 at 09:38

Craig

177
2
12

2

votes

1 answer

Storing ndarrays into Parquet via uber/petastorm?

Is it possible to store N-dimensional arrays into Parquet via uber/petastorm ?

python arrays matrix parquet petastorm

asked Feb 14 '19 at 21:48

Leo Gallucci

16,355
12
77
110

1

vote

1 answer

Create train and valid dataset in petastorm

Versions : Python3.7.13, Tensorflow-2.9.1, Petastorm-0.12.1 In petastorm it seems as if only way to train model using dataset created from petastorm is to fit the model within Reader context manager like below as done in…

python tensorflow petastorm

asked May 16 '23 at 07:26

haneulkim

4,406
9
38
80

1

vote

0 answers

spark: exec: "executor": executable file not found in $PATH: unknown

I am trying to do some calculus by using petastorm v0.11.4 in a docker container and minikube v1.25.2 As long as I run the process locally, everything works as expected. As soon as I try to spread the work in the minikube cluster, I receive the…

apache-spark kubernetes pyspark petastorm

asked Apr 29 '22 at 12:02

skynet1010

143
4
11

1

vote

0 answers

Tensorflow pentastrom , training stuck

I have 2 very large (in tb) datasets (using pentastorm to train tf model) what I am doing is loading the datasets using pentastorm and then creating a single feature and labels dataset, as I cant pass two separate datasets train_X_mlp =…

tensorflow pyspark databricks tf.keras petastorm

asked Jan 17 '22 at 03:44

prajwal rao

87
1
9

1

vote

0 answers

Petastorm with Databricks Connect failing

Using Azure Databricks. I have petastorm==0.11.2 and databricks-connect==9.1.0 My databricks-connect session seems to be working I'm able to read in data into my remote workspace. But when I use petastorm to create a spark converter object it says…

databricks databricks-connect petastorm

asked Dec 25 '21 at 21:38

Jamalan

482
4
15

1

vote

1 answer

What is best way to convert time series data (parquet format) into sequences using petastorm?

Pardon me if use the terms in the wrong sense. I am still grappling with many spark and distributed related things. Here is my use case and I am not able to get a complete picture of the implementation. I have time-series data of 40 columns and 100…

python pyspark databricks horovod petastorm

asked Feb 23 '21 at 15:05

Ashok Krishna

143
1
5

1

vote

1 answer

How to replace tf.train.batch , as it is deprecated

This is the code for training mnist data using Petastorm. def train_and_test(dataset_url, training_iterations, batch_size, evaluation_interval): with make_reader(os.path.join(dataset_url, 'train'), num_epochs=None) as train_reader: with…

python tensorflow petastorm

asked Nov 01 '20 at 19:36

Asha

67
5

1

vote

0 answers

Trying to create parquet Petastorm dataset

I'm currently trying to create a parquet petastorm dataset to store a video dataset. My code is: MotionSchema = Unischema('TeaserSchema', [ UnischemaField( 'video', np.uint8, (None, None, None, 3), NdarrayCodec(),…

python pyspark parquet petastorm

asked May 06 '20 at 22:15

Guilherme Marques

263
1
7

1

vote

0 answers

InvalidArgumentError when reading parquet files into Keras via Petastorm

I'm trying to read in data from parquet for a language model. The parquet contains two columns: target (int) feature_vec (int array) I'm adapting the code from this post (Which works for me). When I try the code below I get an InvalidArgumentError…

tensorflow keras pyspark databricks petastorm

asked Dec 09 '19 at 22:55

dspringate

1,805
2
13
20

1

vote

2 answers

Creating parquet Petastorm dataset through Spark fails with Overflow error (larger than 4GB)

I'm trying to implement Uber's Petastorm dataset creation which utilizes Spark to create a parquet file following the tutorial on their Github page. The code: spark = SparkSession.builder.config('spark.driver.memory',…

python pyspark petastorm

asked Nov 19 '18 at 08:51

bluesummers

11,365
8
72
108

0

votes

0 answers

How to integrate tf.data.dataset with rayTune for distributed training

Using tensorflow-cpu==2.9.3, petastorm==0.12.1 on python 3.7 I've created tf.data.Dataset using petastorm for train and validation dataset. ds_train (DatasetV1Adapter; think this is old version of tf.data.dataset) ds_valid (DatasetV1Adapter) First…

python tensorflow ray ray-tune petastorm

asked Jul 17 '23 at 07:10

haneulkim

4,406
9
38
80

Questions tagged [petastorm]