I have created a CSV file and read that. The CSV created has...
1 to 100 in column 0
101 to 200 in column 1
201 to 300 in column 2
301 to 400 in column 3
401 to 500 in column 4
This is when reading with "read_csv" reads the rows perfectly.
Later converted the CSV to a delta table, and parquet and saved it.
When reading the delta with DELTATABLE, pandas.read_parquet reads all the rows in order.
It doesn't read all the columns in the particular row as shown in the CSV. when reading with pyspark.pandas.read_delta(path)
df_csv = pd.read_csv("sparkrowsissue.csv")```
```df_delta = DeltaTable(path_delta).to_pandas()```
```df_parquet = pd.read_parquet(path_parquet)```
```df_spark_delta1 = ps.read_delta(path_delta).to_pandas()```
```df_spark_delta2 = spark.read.format("delta").load(path_delta).toPandas()```
```firstrows = 3```
```print("=================csv file pandas=================")```
```print(df_csv.head(firstrows))```
```print("======================Delta table===============")```
```print(df_delta.head(firstrows))```
```print("======================Parquet file===============")```
```print(df_parquet.head(firstrows))```
```print("======================pySpark Delta table===============")```
```print(df_spark_delta1.head(firstrows))```
```print("======================Spark Delta table===============")```
```print(df_spark_delta2.head(firstrows))
[Pyspark Delta table and Spark Delta table are jumbled rows]