0

I have created a CSV file and read that. The CSV created has...

1 to 100 in column 0
101 to 200 in column 1
201 to 300 in column 2
301 to 400 in column 3
401 to 500 in column 4

This is when reading with "read_csv" reads the rows perfectly. Later converted the CSV to a delta table, and parquet and saved it. When reading the delta with DELTATABLE, pandas.read_parquet reads all the rows in order. It doesn't read all the columns in the particular row as shown in the CSV. when reading with pyspark.pandas.read_delta(path)

df_csv = pd.read_csv("sparkrowsissue.csv")```

```df_delta = DeltaTable(path_delta).to_pandas()```

```df_parquet = pd.read_parquet(path_parquet)```

```df_spark_delta1 = ps.read_delta(path_delta).to_pandas()```

```df_spark_delta2 = spark.read.format("delta").load(path_delta).toPandas()```



```firstrows = 3```

```print("=================csv file pandas=================")```

```print(df_csv.head(firstrows))```

```print("======================Delta table===============")```

```print(df_delta.head(firstrows))```

```print("======================Parquet file===============")```

```print(df_parquet.head(firstrows))```

```print("======================pySpark Delta table===============")```

```print(df_spark_delta1.head(firstrows))```

```print("======================Spark Delta table===============")```

```print(df_spark_delta2.head(firstrows))

[Pyspark Delta table and Spark Delta table are jumbled rows]

1

eshirvana
  • 23,227
  • 3
  • 22
  • 38

0 Answers0