Using compressed files with Datafusion

Question

Is there a way to use compressed files with Cloud data fusion. I have used Google Storage as a source and placed a gzip file in the preferred location.

In the wrangler transform, I don't see a preview. When I try to select the file using select Data the zipped file is not highlighted. The steps work fine when I work with an uncompressed file.

Should I be using some transform before I wrangle? Is there a way where I can read a compressed file directly and preview the data. In data prep, the transform identifies the files based on the extension, however, in data fusion, there seems to be no such option.

I was using a basic version of the data fusion environment, would enterprise edition help?

score 1 · Accepted Answer · answered Nov 26 '19 at 17:50

1

Wrangler expects the files to be uncompressed and does not yet support reading compressed files. I have opened an enhancement request for the same https://issues.cask.co/browse/CDAP-16140

Thanks, Sree

answered Nov 26 '19 at 17:50

Sree

714
4
8

Thanks Sree, will wait for the update on the enhancement request. I am new to using the CDAP environment so i am not very familiar with the interface, is there a way i can use a compressed file as my source and then use a transform to unzip the file and then feed it to the wrangler? – Trishit Ghosh Nov 27 '19 at 08:44

score 0 · Answer 2 · answered Jan 10 '20 at 09:08

Although the wrangler doesn't let us select the compressed file and perform transforms on the file interactively, we can however enter the wrangler directives manually. The pipeline would work as expected when we supply a compressed file to the source at run time.

Using compressed files with Datafusion

2 Answers2