hope you all are fine. I'm reading files from a directory using structured streaming
schema = StructType([
StructField("RowNo", StringType()),
StructField("InvoiceNo", StringType()),
StructField("StockCode", StringType()),
StructField("Description", StringType()),
StructField("Quantity", StringType()),
StructField("InvoiceDate", StringType()),
StructField("UnitPrice", StringType()),
StructField("CustomerId", StringType()),
StructField("Country", StringType()),
StructField("InvoiceTimestamp", StringType())
])
data = spark.readStream.format("orc").schema(schema).option("header", "true").option("path", "<path_here>").load()
After applying some transformations, I like to save the output files with size of 100MB.