1

Please note that I do not want double quotes around all field; just strings.

Working on AWS Glue Studio, if I have select concat(ref_alpha, '!', ref_beta) and send it to a csv file I get

"AB12!RT45"

but if I have concat(ref_alpha, 'T', ref_beta) I get

AB12TRT45

without the double quotes. I would like double quotes on all strings in the csv file. How can I achieve that?

enter image description here

Complete code:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame

def sparkSqlQuery(glueContext, query, mapping, transformation_ctx) -> DynamicFrame:
    for alias, frame in mapping.items():
        frame.toDF().createOrReplaceTempView(alias)
    result = spark.sql(query)
    return DynamicFrame.fromDF(result, glueContext, transformation_ctx)

SqlQuery0 = '''
select concat(meter_number, '!', meter_number), 
concat(meter_number, 'T', meter_number) 
from rawData
'''

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## @type: DataSource
## @args: [database = "abc-12345678-glue-db-one", table_name = "raw", transformation_ctx = "DataSource0"]
## @return: DataSource0
## @inputs: []
DataSource0 = glueContext.create_dynamic_frame.from_catalog(database = "abc-12345678-glue-db-one", table_name = "raw", transformation_ctx = "DataSource0")
## @type: SqlCode
## @args: [sqlAliases = {"rawData": DataSource0}, sqlName = SqlQuery0, transformation_ctx = "Transform0"]
## @return: Transform0
## @inputs: [dfc = DataSource0]
Transform0 = sparkSqlQuery(glueContext, query = SqlQuery0, mapping = {"rawData": DataSource0}, transformation_ctx = "Transform0")
## @type: DataSink
## @args: [connection_type = "s3", format = "csv", connection_options = {"path": "s3://dev-abc-12345678/out/", "partitionKeys": []}, transformation_ctx = "DataSink0"]
## @return: DataSink0
## @inputs: [frame = Transform0]
DataSink0 = glueContext.write_dynamic_frame.from_options(frame = Transform0, connection_type = "s3", format = "csv", connection_options = {"path": "s3://dev-abc-12345678/out/", "partitionKeys": []}, transformation_ctx = "DataSink0")
job.commit()
corisco
  • 115
  • 5
  • can you post your code ? – Srinivas Jun 01 '21 at 02:45
  • This questions has not been answered. I am not looking to place double quotes around all fields which is what the other answers offer. I am looking for double quotes around string fields only. Numerics should not have double quotes. – corisco Jun 01 '21 at 08:44

0 Answers0