0

When i use read_parquet method to read parquet file, it occurs Column 8 named hostIp expected length 548 but got length 549 error, hostIP is one column in REQUIRED_COLUMNS.

import pandas as pd

REQUIRED_COLUMNS = [...]
path = ...
data = pd.read_parquet(path, columns=REQUIRED_COLUMNS)

When i iterate each column in REQUIRED_COLUMNS to call read_parquet, it successed.

for col in REQUIRED_COLUMNS:
    columns = [col]
    data = pd.read_parquet(path, columns=columns)

And i check that the number of raws is 548 for each column in the above process.

0x26res
  • 11,925
  • 11
  • 54
  • 108

1 Answers1

0

The error you are getting is because the hostIp column in your Parquet file has 549 rows, but the read_parquet() method is expecting it to have 548 rows.

The code you have provided shows that you are iterating over the REQUIRED_COLUMNS list and calling read_parquet() for each column individually. This works because each column has 548 rows. However, when you call read_parquet() with the REQUIRED_COLUMNS list as the columns argument, it will try to read all of the columns in the list, including the hostIp column, which has 549 rows. This is why you are getting the error.

To solve this problem, you can either:

  • Change the read_parquet() method to only read the first 548 rows of the hostIp column.
  • Remove the hostIp column from the REQUIRED_COLUMNS list.

Here is an example of how to change the read_parquet() method to only read the first 548 rows of the hostIp column:

def read_parquet_with_limited_hostIp(path, columns):
    data = pd.read_parquet(path, columns=columns)
    hostIp_data = data["hostIp"][:548]
    return hostIp_data

hostIp_data = read_parquet_with_limited_hostIp(path, REQUIRED_COLUMNS)

Here is an example of how to remove the hostIp column from the REQUIRED_COLUMNS list:

REQUIRED_COLUMNS = REQUIRED_COLUMNS[:7]
data = pd.read_parquet(path, columns=REQUIRED_COLUMNS)
Dev Patel
  • 21
  • 5
  • I check that length of column `hostIp` is 548, and i use columns `['uuId', 'hostIp']` to read for test, it can read successful. – chunhuagod Jul 19 '23 at 01:36