0

I'm very beginner with wandb , so this is very basic question. I have dataframe which has my x features and y values. I'm tryin to follow this tutorial to train model from my pandas dataframe . However, when I try to create wandb table from my pandas dataframe, I get an error:


wandb.init(project='my-xgb', config={'lr': 0.01})

#the log didn't work  so I haven't run it at the moment (the log 'loss') 
#wandb.log({'loss': loss, ...})


# Create a W&B Table with your pandas dataframe
table = wandb.Table(df1)

AssertionError: columns argument expects a list object

I have no idea why is this happen, and why it excpect a list. In the tutorial it doesn't look like the dataframe is list.

My end goal - to be able to create wandb table.

Reut
  • 1,555
  • 4
  • 23
  • 55

1 Answers1

3

Short answer: table = wandb.Table(dataframe=my_df).

The explanation of your specific case is at the bottom.


Minimal example of using wandb.Table with a DataFrame:

import wandb
import pandas as pd

iris_path = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
iris = pd.read_csv(iris_path)
table = wandb.Table(dataframe=iris)
wandb.log({'dataframe_in_table': table})

(Here the dataset is called the Iris dataset that consists of "3 different types of irises’ (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray")

There are two ways of creating W&B Tables according to the official documentation:

  • List of Rows: Log named columns and rows of data. For example: wandb.Table(columns=["a", "b", "c"], data=[["1a", "1b", "1c"], ["2a", "2b", "2c"]]) generates a table with two rows and three columns.
  • Pandas DataFrame: Log a DataFrame using wandb.Table(dataframe=my_df). Column names will be extracted from the DataFrame.

Explanation: Why table = wandb.Table(my_df) gives error "columns argument expects a list object"? Because wandb.Table's init function looks like this:

def __init__(
        self,
        columns=None,
        data=None,
        rows=None,
        dataframe=None,
        dtype=None,
        optional=True,
        allow_mixed_types=False,
    ):

If one passes a DataFrame without telling it's a DataFrame, wandb.Table will assume the argument is columns.

Mark
  • 336
  • 1
  • 7
  • thank you for the quick answer! so, if I want to use my train-test sets, should I convert each one into dataframe? or there is wandb function that is not scikit-learn split test train? – Reut Jun 23 '22 at 11:56
  • 2
    Hi @Reut, I am really sorry I don't think I fully understood your comment. I would say W&B is only used for visualisation/logging, and I would never use it for manipulating the data for training. Here is a super detailed [Guide to W&B Tables](https://wandb.ai/stacey/mnist-viz/reports/Guide-to-W-B-Tables--Vmlldzo2NTAzOTk#1.-how-to-log-a-wandb.table). Hope it helps! – Mark Jun 23 '22 at 12:11