2

I want to save the data set as a parquet file, called power.parquet, and I use df.to_parquet(<filename>). But it gives me this errer "ValueError: Error converting column "Global_reactive_power" to bytes using encoding UTF8. Original error: bad argument type for built-in operation" And I installed the fastparquet package.

from fastparquet import write, ParquetFile

dat.to_parquet("power.parquet")

df_parquet = ParquetFile("power.parquet").to_pandas()

df_parquet.head() # Test your final value

`*Traceback (most recent call last):

  File "/opt/anaconda3/lib/python3.9/site-packages/fastparquet/writer.py", line 259, in convert
    out = array_encode_utf8(data)

  File "fastparquet/speedups.pyx", line 50, in fastparquet.speedups.array_encode_utf8

TypeError: bad argument type for built-in operation


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/var/folders/4f/bm2th1p56tz4rq_zffc8g3940000gn/T/ipykernel_85477/3080656655.py", line 1, in <module>
    dat.to_parquet("power.parquet", compression="GZIP")

  File "/opt/anaconda3/lib/python3.9/site-packages/dask/dataframe/core.py", line 4560, in to_parquet
    return to_parquet(self, path, *args, **kwargs)

  File "/opt/anaconda3/lib/python3.9/site-packages/dask/dataframe/io/parquet/core.py", line 732, in to_parquet
    return compute_as_if_collection(

  File "/opt/anaconda3/lib/python3.9/site-packages/dask/base.py", line 315, in compute_as_if_collection
    return schedule(dsk2, keys, **kwargs)

  File "/opt/anaconda3/lib/python3.9/site-packages/dask/threaded.py", line 79, in get
    results = get_async(

  File "/opt/anaconda3/lib/python3.9/site-packages/dask/local.py", line 507, in get_async
    raise_exception(exc, tb)

  File "/opt/anaconda3/lib/python3.9/site-packages/dask/local.py", line 315, in reraise
    raise exc

  File "/opt/anaconda3/lib/python3.9/site-packages/dask/local.py", line 220, in execute_task
    result = _execute_task(task, data)

  File "/opt/anaconda3/lib/python3.9/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))

  File "/opt/anaconda3/lib/python3.9/site-packages/dask/utils.py", line 35, in apply
    return func(*args, **kwargs)

  File "/opt/anaconda3/lib/python3.9/site-packages/dask/dataframe/io/parquet/fastparquet.py", line 1167, in write_partition
    rg = make_part_file(

  File "/opt/anaconda3/lib/python3.9/site-packages/fastparquet/writer.py", line 716, in make_part_file
    rg = make_row_group(f, data, schema, compression=compression,

  File "/opt/anaconda3/lib/python3.9/site-packages/fastparquet/writer.py", line 701, in make_row_group
    chunk = write_column(f, coldata, column,

  File "/opt/anaconda3/lib/python3.9/site-packages/fastparquet/writer.py", line 554, in write_column
    repetition_data, definition_data, encode[encoding](data, selement), 8 * b'\x00'

  File "/opt/anaconda3/lib/python3.9/site-packages/fastparquet/writer.py", line 354, in encode_plain
    out = convert(data, se)

  File "/opt/anaconda3/lib/python3.9/site-packages/fastparquet/writer.py", line 284, in convert
    raise ValueError('Error converting column "%s" to bytes using '

ValueError: Error converting column "Global_reactive_power" to bytes using encoding UTF8. Original error: bad argument type for built-in operation

*

I tried by adding object_coding = "bytes".I want to solve this problem.

lok6666
  • 21
  • 4
  • Are all values in the "Global_reactive_power" column strings? Try running `dat['Global_reactive_power'].apply(type).unique()` to figure it out. Also can they all be encoded to utf-8? Try running `dat['Global_reactive_power'].apply(lambda x: x.encode('utf-8'))` to figure it out. – 0x26res Mar 31 '22 at 08:24
  • I generate a time series plot before, and I converted the "Global_reactive_power" column into numerical variable. So it has to be a string? – lok6666 Mar 31 '22 at 20:32
  • Here is my result after I run "dat['Global_reactive_power'].apply(lambda x: x.encode('utf-8'))" . " dat['Global_reactive_power'].apply(lambda x: x.encode('utf-8')) AttributeError: 'float' object has no attribute 'encode'" – lok6666 Mar 31 '22 at 20:36
  • It can be a string or a float, but It can't be a mix of both. – 0x26res Apr 01 '22 at 08:31
  • Got it. But when I run pd.dtypes, it shows the types of Global_active_power is float64. And I run "dat['Global_reactive_power'].apply(type).unique()", it shows "array([, ], dtype=object)". I tried to covert the type of this column into a string, but I still can't generate a parquet file. The same error shows up. – lok6666 Apr 01 '22 at 22:26
  • You can only save to parquet if the column have got consistent and unique types. Please edit your question and add the code that do the conversion to string. There must be a problem there. – 0x26res Apr 02 '22 at 13:57

0 Answers0