Loading datasets in offline mode in sklearn and skmultilearn

Question

I would like to use datasets: emotions, scene, and yeast in my project in anaconda (python 3.6.5). I have used the following codes:

from skmultilearn.dataset import load_dataset
X_train, y_train, feature_names, label_names = load_dataset('emotions', 'train')

It works successfully when I am connected to the internet, But when I am offline, it doesn't work! I have downloaded all 3 named above datasets in a folder like this:

H:\Projects\Datasets

How can I use this folder as my source datasets while I am offline? (I'm using windows 10)

The extensions of datasets that I have downloaded them are: .rar Like this: emotions.rar, scene.rar, and yeast.rar, and I have downloaded them from: http://mulan.sourceforge.net/datasets-mlc.html

when using `load_dataset()` you are attempting to download certain datasets from a server, which is not possible without an internet connection. If you already have the files downloaded in local storage, you might be able to use them in the [offline mode](https://docs.anaconda.com/anaconda/navigator/overview/#online-and-offline-modes) by using other file utilities (like importing a `csv` file to a pandas dataframe) — manesioz, Nov 18 '19 at 17:46
Is there any solution that I use my computer as local server to load dataset from my HDD with function `load_dataset()` ? — Alireza Ghanbari, Nov 18 '19 at 18:43
Your answer was useful to find what is my problem. But to solve my problem, I used @makis solution. Thanks both of you. — Alireza Ghanbari, Nov 18 '19 at 20:11

seralouk · Accepted Answer · 2019-11-18T20:19:23.503

1

You can but you first need to know the path that the dataset was stored to. To do this you can load once and get the path. This path will never change so you only need to do the following once in order to get the desired path. Next, knowing the path, you can load offline whatever you want.

Example:

from sklearn.datasets import load_iris
import pandas as pd, os

#get the path
path = load_iris()['filename']
print(path)

#offline load
df = pd.read_csv(path)

#the path: THIS IS WHAT YOU NEED
main_path_with_datasets = os.path.dirname(path)

Once you get the main_path_with_datasets i.e. by doing main_path_with_datasets = os.path.dirname(path), you will now have the path. You can use it to load all the available downloaded datasets.

os.listdir(main_path_with_datasets)

['digits.csv.gz',
 'wine_data.csv',
 'diabetes_target.csv.gz',
 'iris.csv',
 'breast_cancer.csv',
 'diabetes_data.csv.gz',
 'linnerud_physiological.csv',
 'linnerud_exercise.csv',
 'boston_house_prices.csv']

EDIT for skmultilearn

from skmultilearn.dataset import load_dataset_dump

path = 'C:\\Users\\myname\\scikit_ml_learn_data\\'

X, y, feature_names, label_names = load_dataset_dump(path + 'emotions-train.scikitml.bz2')

edited Nov 18 '19 at 20:19

answered Nov 18 '19 at 19:07

seralouk

30,938
9
118
133

The datasets was downloaded in this folder automatically: `C:\Users\myname\scikit_ml_learn_data` And it's name is: `emotions-train.scikitml.bz2` But it seems that my program doesn't use it! because every time that I run my program, it checks that internet connection. If there is no connection, the program doesn't work! – Alireza Ghanbari Nov 18 '19 at 19:31
1

then you just need `os.listdir("C:\Users\myname\scikit_ml_learn_data")` – seralouk Nov 18 '19 at 19:35
Your last answer seems that works with some little edit like this: `path = 'C:\\Users\\myname\\scikit_ml_learn_data\\'`. I'll check it and I'll tell you. Thank you. – Alireza Ghanbari Nov 18 '19 at 20:01
1

right. If you have WINDOWS you need double backslash – seralouk Nov 18 '19 at 20:19
1

nice. consider upvoting my answer – seralouk Nov 19 '19 at 06:38

Loading datasets in offline mode in sklearn and skmultilearn

1 Answers1