How to install rarfile and load the arabic_billion_words dataset from Huggingface dataset?

Question

I'm encountering an error while trying to load a Hugging Face dataset that requires the rarfile library. I have already installed rarfile using pip install rarfile, but I'm still getting the same error.

Here are the details of my environment, python==3.10

The specific error message I'm encountering is:

`Downloading and preparing dataset arabic_billion_words/Alqabas to /root/.cache/huggingface/datasets/arabic_billion_words/Alqabas/1.1.0/687a1f963284c8a766558661375ea8f7ab3fa3633f8cd9c9f42a53ebe83bfe17...
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-63-0200997cf3c2> in <cell line: 3>()
      1 from datasets import load_dataset
      2 
----> 3 dataset = load_dataset("arabic_billion_words",list_data[3])

11 frames
/usr/local/lib/python3.10/dist-packages/datasets/utils/extract.py in extract(input_path, output_path)
    208     def extract(input_path: Union[Path, str], output_path: Union[Path, str]) -> None:
    209         if not config.RARFILE_AVAILABLE:
--> 210             raise ImportError("Please pip install rarfile")
    211         import rarfile
    212 

ImportError: Please pip install rarfile

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------`

I have already tried the following troubleshooting steps:

Installed rarfile using pip install rarfile.
Verified that the rarfile library is present in the list of installed packages.
Restarted my Python interpreter or IDE after installing rarfile.

Despite these attempts, I'm still unable to load the Hugging Face dataset due to the rarfile import error. I'm unsure about the next steps to resolve this issue.

I would appreciate any insights or suggestions on how to overcome this problem. If there are alternative methods to load or work with Hugging Face datasets that involve RAR files, I'm open to exploring those as well.

Thank you for your assistance and expertise.

score 0 · Answer 1 · answered May 26 '23 at 04:37

When you try:

from datasets import load_dataset
ds = load_dataset('arabic_billion_words', 'Alqabas')

You'll see this error:

Downloading and preparing dataset arabic_billion_words/Alqabas to /root/.cache/huggingface/datasets/arabic_billion_words/Alqabas/1.1.0/687a1f963284c8a766558661375ea8f7ab3fa3633f8cd9c9f42a53ebe83bfe17...
Downloading data: 100%
595M/595M [00:28<00:00, 22.3MB/s]
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-73-a7541d995840> in <cell line: 1>()
----> 1 ds = load_dataset('arabic_billion_words', 'Alqabas')

11 frames
/usr/local/lib/python3.10/dist-packages/datasets/utils/extract.py in extract(input_path, output_path)
    208     def extract(input_path: Union[Path, str], output_path: Union[Path, str]) -> None:
    209         if not config.RARFILE_AVAILABLE:
--> 210             raise ImportError("Please pip install rarfile")
    211         import rarfile
    212 

ImportError: Please pip install rarfile

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

If you're using Jupyter, then do this:

! pip install -U rarfile
! pip install patool

(Otherwise, do the pip install in your Python environment through your IDE or CLI)

After the installation is completed, restart the runtime. (if you're using Jupyter)

Then redo this:

from datasets import load_dataset
ds = load_dataset('arabic_billion_words', 'Alqabas')

Now it should work.

How to install rarfile and load the arabic_billion_words dataset from Huggingface dataset?

1 Answers1