Suppose I have two files.
file0.txt
field1 | field2 |
---|---|
1 | 2 |
1 | 2 |
file1.txt
field2 | field1 |
---|---|
2 | 1 |
2 | 1 |
Now, if I write:
spark.read.csv(["./file0.txt""./file1.txt"], sep=',', header=True, inferSchema=True).show()
the following dataframe is read by spark.
field1 | field2 |
---|---|
1 | 2 |
1 | 2 |
2 | 1 |
2 | 1 |
but it should have been,
field1 | field2 |
---|---|
1 | 2 |
1 | 2 |
1 | 2 |
1 | 2 |
I tried using inferSchema. As I have a lot of files in the folder I cannot hardcode the ordering of the columns in the csvs.