I have an S3 file generated from the different system which is as below:
A1|~|B1|~|C1|~|D1|~|
A2|~|B2|~|C2|~|D2|~|
A3|~|B3|~|C3|~|D3|~|
A4|~|B4|~|C4|~|D4|~|
Now while reading this file in AWS Glue Pyspark script, I want to remove the last delimiter from the file. Could you please let me know how to do it?
Issue is- While trying to convert this .TXT file to parquet, when I am mentioning delimeter as '|~|' it's adding an extra column at the end. This is happening because in the source file there is an extra |~| delimeter at the end of each row.
So that's why I want to remove the last |~| delimeter from each row in the file and then convert it to parquet.
code :-
input = sc.textFile("filename.TXT").map(lambda x: x.split('|~|'))
df=spark.createDataFrame(input,list_of_colun_names)