1

this is my scenario. I have bz2 file in Amazon s3. Within the bz2 file, there lies files with .dat,.met,.sta extensions.I am only interested in files with *.dat extensions.You can download this samplefile to take a look at bz2 file.

create external table cdr (
   anum string,
   bnum string,
   numOfTimes int
)
row format delimited
    fields terminated by ','
    lines terminated by '\n'
location 's3://mybucket/dir'; #the zip file is inside here

The problem lies such that when I execute the above command, some of the records/rows had issues.

1)all the data from files such as *.sta and *.met are also included.
2)the metadata of the filenames are also included.

The only idea I had was to show the INPUT_FILE_NAME. But then, all the records/rows had the same INPUT_FILE_NAME which was the filename.tar.bz2.

Any suggestions are welcome. I am currently completely lost.

prog_guy
  • 796
  • 3
  • 7
  • 24

0 Answers0