0

This is the output I get after I append the outputs of a pipeline using facebook-bart-large-mnli model.

[{'labels': ['recreation', 'entertainment', 'travel', 'dining'],
  'scores': [0.8873, 0.1528, 0.0002, 0.0001],
  'sequence': 'laundromat'},
 {'labels': ['recreation', 'travel', 'entertainment', 'dining'],
  'scores': [0.9932, 0.9753, 0.2099, 0.0001],
  'sequence': 'running trail'},
 {'labels': ['dining', 'entertainment', 'recreation', 'travel'],
  'scores': [0.8846, 0.1825, 0.1067, 0.0001],
  'sequence': 'affordable housing for young families,cafe or restaurant,none'},
 {'labels': ['travel', 'entertainment', 'recreation', 'dining'],
  'scores': [0.3595, 0.0716, 0.0187, 0.0039],
  'sequence': 'electric vehicle chargers,ev charger'},
    classifiedText.append(classifier2(sequence_to_classify, candidate_labels, multi_label=True))

Is there a way for the pipeline to follow the ordering of candidate_labels? (it is a list)

I am trying to output the results of the zero shot classification performed by transformers into a pandas Dataframe, and I found that just using the .to_dict method poses a problem because the output of the pipeline has a different order for each phrase that is classified.

Is there a way to get the labels to be consistent? (Of course, still matching to the correct value under the scores key.)

Eric Jin
  • 3,836
  • 4
  • 19
  • 45
flighted
  • 11
  • 2

1 Answers1

0

As far as I know, there is no parameter to do that directly with the pipeline, but you can do it in python by turning the two lists into a dict:

import pandas

# Output of the zeroshot classification pipeline
result = [{'labels': ['recreation', 'entertainment', 'travel', 'dining'],
  'scores': [0.8873, 0.1528, 0.0002, 0.0001],
  'sequence': 'laundromat'},
 {'labels': ['recreation', 'travel', 'entertainment', 'dining'],
  'scores': [0.9932, 0.9753, 0.2099, 0.0001],
  'sequence': 'running trail'},
 {'labels': ['dining', 'entertainment', 'recreation', 'travel'],
  'scores': [0.8846, 0.1825, 0.1067, 0.0001],
  'sequence': 'affordable housing for young families,cafe or restaurant,none'},
 {'labels': ['travel', 'entertainment', 'recreation', 'dining'],
  'scores': [0.3595, 0.0716, 0.0187, 0.0039],
  'sequence': 'electric vehicle chargers,ev charger'},
]

# Turning the two lists into a dict for each result
list_of_dicts = [dict(zip(x["labels"], x["scores"])) for x in result]

print(list_of_dicts)

df=pandas.DataFrame(list_of_dicts)
print(df)

Output:

[{'recreation': 0.8873, 'entertainment': 0.1528, 'travel': 0.0002, 'dining': 0.0001}, {'recreation': 0.9932, 'travel': 0.9753, 'entertainment': 0.2099, 'dining': 0.0001}, {'dining': 0.8846, 'entertainment': 0.1825, 'recreation': 0.1067, 'travel': 0.0001}, {'travel': 0.3595, 'entertainment': 0.0716, 'recreation': 0.0187, 'dining': 0.0039}]

   recreation  entertainment  travel  dining
0      0.8873         0.1528  0.0002  0.0001
1      0.9932         0.2099  0.9753  0.0001
2      0.1067         0.1825  0.0001  0.8846
3      0.0187         0.0716  0.3595  0.0039
cronoik
  • 15,434
  • 3
  • 40
  • 78