2

I want to write a dynamic frame to S3 as a text file and use '|' as the delimiter.

How can I modify the code below, so that Glue saves the frame as a .txt file and uses '|' as the delimiter.

glue_context.write_dynamic_frame.from_options(
        frame = frame,
        connection_type = "s3",    
        connection_options = {"path": outpath},
        format = "csv")
Robert Kossendey
  • 6,733
  • 2
  • 12
  • 42
Beginner
  • 71
  • 1
  • 3
  • 10

4 Answers4

2

You can convert a DynamicFrame to a spark dataframe and use spark write option sep to save it with your delimiter.

df=frame.toDf()
df.write.option("sep","|").option("header","true").csv(filename)
Mariusz K
  • 66
  • 4
1

I'm not exactly sure why you want to write your data with .txt extension, but then in your file you specify format="csv". If you meant as a generic text file, csv is what you want to use.

Glue DynamicFrameWriter supports custom format options, here's what you need to add to your code (also see docs here):

glue_context.write_dynamic_frame.from_options(
    frame=frame,
    connection_type='s3',
    connection_options={
        'path': outpath,
    },
    format='csv',
    format_options={
        'separator': "|"
        # ...other kwargs
    }
)

Please note that DynamicFrameWriter won't allow to specify a name for your file, and will also create multiple outputs based on the amount of partitions created during execution.

If you want just a single output file, you have to do:

df = df.repartition(1)

before writing to s3.

wtfzambo
  • 578
  • 1
  • 12
  • 21
0

Glue does not support .txt as output currently. Here you can read up on the supported file types.

Robert Kossendey
  • 6,733
  • 2
  • 12
  • 42
0

Currently in Glue you can convert spark dataframe to pandas dataframe, simply with:

pandasDF = sparkDF.toPandas()

and you can enjoy all the modern comforts of Pandas.