0

I was wondering why Mallet Classification Model gives the same output even though my instances are completely different from one another.

I have changed the code in CSV2Classify so it only prints out the top 10 labels and their confidence score. I also made it print out the statistical data of each instance so I can be sure that it is working correctly. However, I don't think the code is the problem, because Mallet seems to classify most instances with the same labeling. Nevertheless, below is the code I changed in CSV2Classify:

1.Getting locations with top confidences:

public static int[] getTopLocations(Labeling labeling, int numberOfCategories) {
    double[] values = new double[labeling.numLocations()];
    for (int location = 0; location < labeling.numLocations(); location++) {
        values[location] = labeling.valueAtLocation(location);          
    }
    int[] outputLocations = indexesOfTopElements(values, numberOfCategories);
    return outputLocations;
}

private static int[] indexesOfTopElements(double[] orig, int nummax) {
    double[] copy = Arrays.copyOf(orig,orig.length);
    Arrays.sort(copy);
    double[] honey = Arrays.copyOfRange(copy,copy.length - nummax, copy.length);
    int[] result = new int[nummax];
    int resultPos = 0;
    for(int i = 0; i < orig.length; i++) {
        double onTrial = orig[i];
        int index = Arrays.binarySearch(honey,onTrial);
        if(index < 0) continue;
        result[resultPos++] = i;
    }
    return result;
}

2.Classify and printing out instance's data, in order to see if statistical model is working correctly:

    public static void main (String[] args) throws FileNotFoundException, IOException {

        // Process the command-line options
        CommandOption.setSummary (Csv2Classify.class,
                                  "A tool for classifying a stream of unlabeled instances");
        CommandOption.process (Csv2Classify.class, args);

        // Print some helpful messages for error cases
        if (args.length == 0) {
            CommandOption.getList(Csv2Classify.class).printUsage(false);
            System.exit (-1);
        }
        if (inputFile == null) {
            throw new IllegalArgumentException ("You must include `--input FILE ...' in order to specify a"+
                                "file containing the instances, one per line.");
        }

      // Read classifier from file
        Classifier classifier = null;
        try {
            ObjectInputStream ois =
                new ObjectInputStream (new BufferedInputStream(new FileInputStream (classifierFile.value)));

            classifier = (Classifier) ois.readObject();
            ois.close();
        } catch (Exception e) {
            throw new IllegalArgumentException("Problem loading classifier from file " + classifierFile.value +
                               ": " + e.getMessage());
        }

        // Read instances from the file
        Reader fileReader;
        if (inputFile.value.toString().equals ("-")) {
            fileReader = new InputStreamReader (System.in);
        }
        else {
            fileReader = new InputStreamReader(new FileInputStream(inputFile.value), encoding.value);
        }
        Iterator<Instance> csvIterator =
            new CsvIterator (fileReader, Pattern.compile(lineRegex.value),
            dataOption.value, 0, nameOption.value);
        Iterator<Instance> iterator =
            classifier.getInstancePipe().newIteratorFrom(csvIterator);

        // Write classifications to the output file
        PrintStream out = null;

        if (outputFile.value.toString().equals ("-")) {
            out = System.out;
        }
        else {
            out = new PrintStream(outputFile.value, encoding.value);
        }

        // gdruck@cs.umass.edu
        // Stop growth on the alphabets. If this is not done and new
        // features are added, the feature and classifier parameter
        // indices will not match.
        classifier.getInstancePipe().getDataAlphabet().stopGrowth();
        classifier.getInstancePipe().getTargetAlphabet().stopGrowth();

        while (iterator.hasNext()) {
            Instance instance = iterator.next();

            Labeling labeling =
                classifier.classify(instance).getLabeling();

            StringBuilder output = new StringBuilder();
            output.append(instance.getName() + "\n");                       
            output.append("\t" + instance.getData() + "\n");
            int[] topLocations = Csv2Classify.getTopLocations(labeling, 10);
            for (int index = 0; index < topLocations.length; index++) {
                int location = topLocations[index];
                System.out.print("location printed:" + location + "\n");
                output.append("\t" + labeling.labelAtLocation(location));
                output.append("\t" + labeling.valueAtLocation(location));
            }
            output.append("\n");
            out.println(output);
        }

        if (! outputFile.value.toString().equals ("-")) {
            out.close();
        }
    }
}   

I trained using decision tree, at 50 trials. Then, the command I used to classify is:

bin/mallet classify-file --input data --output classification.output --classifier decision_tree.classifier

Source file:

10914642 sky room business office people young teamwork success professional monitor contemporary entrepreneur customer idea businesspeople window adult person pc indoor guy worker confident attractive operator male job career communication handsome book chair businessman manager work table meeting workplace cloud headset caucasian man lamp executive successful corporate occupation concept
13209539 performance industry panel results photovoltaic technology energy security_helmet nature running eco-friendly ecology sky android touchpad function environment businessman plant solar_panel architecture man analysis worker renewable_energy wireless business electronic_tablet solarium construction engineering cell operation electricity power light sensor electric collector durable setup senior alternative installation solarization checking touchscreen engineer
26375762 building hat occupation expert plan professional engineering confident architector business businessman designer helmet engineer man hardhat architect construction worker suit executive work builder
26780099 desk male sitting technology headphone arabian laptop flare office resting business startup morning job person work casual eastern men creative indoors businessman workplace relax play lifestyle worker phone beard sunset arab sunrise computer professional sun break effect handsome young leisure legs happy hipster people
26783548 lifestyle manager male elegance use one adult work executive cellphone intelligence notebook business laptop technology entrepreneur middle-aged modern smart busy communication job businessman contemporary occupation corporate urban smile man message senior mobile caucasian city computer hold expertise suit professional wireless restaurant internet look break worker phone cafe table smartphone sit
26783561 elegant intelligent caucasian read urban friendly coffee laptop table hold businessman folder sit adult cafeteria computer concentration senior informed lifestyle restaurant male smartphone confident drink look worker one entrepreneur break business middle-aged corporate job cup smart professional work cafe successful technology modern city executive document serious paper man contemporary beard
26958424 serious male formal connection vision city executive mature worker businessman professional phone glasses aspirations call confidence street confident communication corporate feminism successful outdoors solution device lifestyle guy calling standing smart adult caucasian business shirt technology success person suit outside entrepreneur tie urban building man gadget thoughtful smartphone
27207487 leisure window business resting mustache networking beard lonely rest hobby relaxation thinking break lifestyle pool_table mobile_phone sitting room style alone man businessman pool connection locker technology relax
27210236 information workplace development office plan organization research entrepreneur office_worker meeting discussion businessman business_people interaction corporate operations analysis busy strategic process strategy workspace cooperation communication white_collar_worker talking statistics enterpriser corporate_business motivation global_finance investment objective global_market working mission business collaboration thinking global_business planning solution vision tactics marketing
27344048 businessman brick alone management telecommunication planning plan connection working talking internet computer white_collar_worker workspace technology research startup mobile_phone office business window brick_wall laptop strategy place_of_work online on_the_phone man thinking wireless digital_device workplace business_person communication

Result file:

10914642
    person(62)=1.0
job(121)=1.0
occupation(128)=1.0
work(159)=1.0
man(195)=1.0
male(203)=1.0
attractive(204)=1.0
handsome(206)=1.0
confident(209)=1.0
worker(210)=1.0
adult(220)=1.0
caucasian(222)=1.0
young(238)=1.0
people(239)=1.0
professional(327)=1.0
guy(354)=1.0
business(369)=1.0
successful(370)=1.0
executive(371)=1.0
success(376)=1.0
office(379)=1.0
entrepreneur(382)=1.0
corporate(390)=1.0
businesspeople(392)=1.0
businessman(395)=1.0
table(443)=1.0
manager(506)=1.0
meeting(520)=1.0
communication(560)=1.0
teamwork(579)=1.0
indoor(615)=1.0
room(622)=1.0
idea(729)=1.0
workplace(737)=1.0
contemporary(740)=1.0
career(798)=1.0
lamp(829)=1.0
concept(850)=1.0
window(924)=1.0
book(1216)=1.0
cloud(1318)=1.0
sky(1333)=1.0
headset(1595)=1.0
chair(1808)=1.0
customer(2171)=1.0
operator(2345)=1.0
monitor(2741)=1.0

    9466    0.20320855614973263 9467    0.10160427807486631 9505    0.0427807486631016  9514    0.016042780748663103    9468    0.053475935828877004    9462    0.13903743315508021 9463    0.13368983957219252 9460    0.22994652406417113 9486    0.0374331550802139  9506    0.016042780748663103

13209539
    nature(55)=1.0
man(195)=1.0
worker(210)=1.0
industry(218)=1.0
construction(232)=1.0
business(369)=1.0
businessman(395)=1.0
environment(550)=1.0
light(552)=1.0
power(567)=1.0
engineering(617)=1.0
engineer(630)=1.0
senior(653)=1.0
performance(669)=1.0
energy(790)=1.0
electric(1001)=1.0
electricity(1007)=1.0
checking(1295)=1.0
sky(1333)=1.0
installation(1350)=1.0
technology(1395)=1.0
wireless(1461)=1.0
analysis(1797)=1.0
plant(2419)=1.0
alternative(2693)=1.0
results(2748)=1.0
architecture(3206)=1.0
android(3253)=1.0
touchpad(3256)=1.0
running(3294)=1.0
panel(3598)=1.0
photovoltaic(3599)=1.0
security_helmet(3600)=1.0
eco-friendly(3601)=1.0
ecology(3602)=1.0
function(3603)=1.0
solar_panel(3604)=1.0
renewable_energy(3605)=1.0
electronic_tablet(3606)=1.0
solarium(3607)=1.0
cell(3608)=1.0
operation(3609)=1.0
sensor(3610)=1.0
collector(3611)=1.0
durable(3612)=1.0
setup(3613)=1.0
solarization(3614)=1.0
touchscreen(3615)=1.0

    9466    0.20320855614973263 9467    0.10160427807486631 9505    0.0427807486631016  9514    0.016042780748663103    9468    0.053475935828877004    9462    0.13903743315508021 9463    0.13368983957219252 9460    0.22994652406417113 9486    0.0374331550802139  9506    0.016042780748663103

26375762
    occupation(128)=1.0
work(159)=1.0
man(195)=1.0
hat(208)=1.0
confident(209)=1.0
worker(210)=1.0
hardhat(223)=1.0
construction(232)=1.0
professional(327)=1.0
plan(346)=1.0
business(369)=1.0
executive(371)=1.0
suit(377)=1.0
businessman(395)=1.0
building(542)=1.0
engineering(617)=1.0
engineer(630)=1.0
helmet(634)=1.0
builder(1017)=1.0
designer(2675)=1.0
expert(3458)=1.0
architect(4485)=1.0
architector(7523)=1.0

    9466    0.20320855614973263 9467    0.10160427807486631 9505    0.0427807486631016  9514    0.016042780748663103    9468    0.053475935828877004    9462    0.13903743315508021 9463    0.13368983957219252 9460    0.22994652406417113 9486    0.0374331550802139  9506    0.016042780748663103

26780099
    person(62)=1.0
job(121)=1.0
work(159)=1.0
male(203)=1.0
handsome(206)=1.0
worker(210)=1.0
lifestyle(230)=1.0
young(238)=1.0
people(239)=1.0
men(241)=1.0
eastern(320)=1.0
happy(323)=1.0
professional(327)=1.0
play(364)=1.0
business(369)=1.0
office(379)=1.0
indoors(389)=1.0
businessman(395)=1.0
laptop(512)=1.0
computer(517)=1.0
sitting(588)=1.0
casual(631)=1.0
workplace(737)=1.0
leisure(783)=1.0
phone(804)=1.0
beard(871)=1.0
desk(928)=1.0
sun(1329)=1.0
technology(1395)=1.0
headphone(1639)=1.0
creative(2040)=1.0
relax(2166)=1.0
break(2247)=1.0
legs(2256)=1.0
morning(2415)=1.0
sunset(2538)=1.0
hipster(3302)=1.0
sunrise(3325)=1.0
arabian(3671)=1.0
startup(3699)=1.0
effect(4314)=1.0
resting(5592)=1.0
flare(7660)=1.0
arab(7661)=1.0

    9466    0.20320855614973263 9467    0.10160427807486631 9505    0.0427807486631016  9514    0.016042780748663103    9468    0.053475935828877004    9462    0.13903743315508021 9463    0.13368983957219252 9460    0.22994652406417113 9486    0.0374331550802139  9506    0.016042780748663103

26783548
    job(121)=1.0
occupation(128)=1.0
smile(150)=1.0
work(159)=1.0
man(195)=1.0
one(199)=1.0
male(203)=1.0
worker(210)=1.0
adult(220)=1.0
caucasian(222)=1.0
lifestyle(230)=1.0
professional(327)=1.0
business(369)=1.0
executive(371)=1.0
suit(377)=1.0
entrepreneur(382)=1.0
corporate(390)=1.0
businessman(395)=1.0
table(443)=1.0
restaurant(448)=1.0
manager(506)=1.0
laptop(512)=1.0
computer(517)=1.0
modern(540)=1.0
look(557)=1.0
communication(560)=1.0
message(574)=1.0
elegance(644)=1.0
senior(653)=1.0
smart(732)=1.0
contemporary(740)=1.0
busy(746)=1.0
expertise(775)=1.0
phone(804)=1.0
mobile(881)=1.0
cellphone(1223)=1.0
hold(1237)=1.0
sit(1364)=1.0
technology(1395)=1.0
wireless(1461)=1.0
use(1645)=1.0
internet(1805)=1.0
notebook(1809)=1.0
city(1865)=1.0
smartphone(2216)=1.0
break(2247)=1.0
urban(3021)=1.0
middle-aged(3492)=1.0
cafe(4856)=1.0
intelligence(4943)=1.0

    9466    0.20320855614973263 9467    0.10160427807486631 9505    0.0427807486631016  9514    0.016042780748663103    9468    0.053475935828877004    9462    0.13903743315508021 9463    0.13368983957219252 9460    0.22994652406417113 9486    0.0374331550802139  9506    0.016042780748663103

26783561
    drink(109)=1.0
job(121)=1.0
work(159)=1.0
man(195)=1.0
one(199)=1.0
male(203)=1.0
confident(209)=1.0
worker(210)=1.0
adult(220)=1.0
caucasian(222)=1.0
serious(224)=1.0
lifestyle(230)=1.0
professional(327)=1.0
friendly(329)=1.0
business(369)=1.0
successful(370)=1.0
executive(371)=1.0
entrepreneur(382)=1.0
corporate(390)=1.0
businessman(395)=1.0
table(443)=1.0
restaurant(448)=1.0
elegant(455)=1.0
cup(485)=1.0
laptop(512)=1.0
computer(517)=1.0
folder(523)=1.0
modern(540)=1.0
look(557)=1.0
senior(653)=1.0
smart(732)=1.0
contemporary(740)=1.0
document(744)=1.0
paper(745)=1.0
beard(871)=1.0
coffee(947)=1.0
hold(1237)=1.0
sit(1364)=1.0
read(1372)=1.0
technology(1395)=1.0
concentration(1428)=1.0
city(1865)=1.0
smartphone(2216)=1.0
break(2247)=1.0
urban(3021)=1.0
middle-aged(3492)=1.0
cafeteria(4134)=1.0
cafe(4856)=1.0
intelligent(5655)=1.0
informed(7666)=1.0

    9466    0.20320855614973263 9467    0.10160427807486631 9505    0.0427807486631016  9514    0.016042780748663103    9468    0.053475935828877004    9462    0.13903743315508021 9463    0.13368983957219252 9460    0.22994652406417113 9486    0.0374331550802139  9506    0.016042780748663103

26958424
    person(62)=1.0
confidence(194)=1.0
man(195)=1.0
male(203)=1.0
confident(209)=1.0
worker(210)=1.0
adult(220)=1.0
caucasian(222)=1.0
serious(224)=1.0
outdoors(227)=1.0
lifestyle(230)=1.0
standing(236)=1.0
professional(327)=1.0
guy(354)=1.0
tie(367)=1.0
business(369)=1.0
successful(370)=1.0
executive(371)=1.0
success(376)=1.0
suit(377)=1.0
entrepreneur(382)=1.0
corporate(390)=1.0
businessman(395)=1.0
glasses(456)=1.0
building(542)=1.0
mature(553)=1.0
communication(560)=1.0
smart(732)=1.0
formal(733)=1.0
outside(784)=1.0
phone(804)=1.0
shirt(876)=1.0
solution(1021)=1.0
technology(1395)=1.0
connection(1466)=1.0
city(1865)=1.0
smartphone(2216)=1.0
vision(2313)=1.0
call(2347)=1.0
calling(2491)=1.0
device(2574)=1.0
urban(3021)=1.0
thoughtful(3358)=1.0
street(3428)=1.0
gadget(4981)=1.0
feminism(7584)=1.0
aspirations(7741)=1.0

    9466    0.20320855614973263 9467    0.10160427807486631 9505    0.0427807486631016  9514    0.016042780748663103    9468    0.053475935828877004    9462    0.13903743315508021 9463    0.13368983957219252 9460    0.22994652406417113 9486    0.0374331550802139  9506    0.016042780748663103

27207487
    man(195)=1.0
lifestyle(230)=1.0
style(306)=1.0
business(369)=1.0
businessman(395)=1.0
sitting(588)=1.0
room(622)=1.0
leisure(783)=1.0
beard(871)=1.0
window(924)=1.0
technology(1395)=1.0
connection(1466)=1.0
thinking(1487)=1.0
alone(1978)=1.0
relax(2166)=1.0
break(2247)=1.0
mobile_phone(2284)=1.0
mustache(2332)=1.0
hobby(2524)=1.0
relaxation(2702)=1.0
networking(3698)=1.0
pool(5219)=1.0
resting(5592)=1.0
rest(5768)=1.0
lonely(7803)=1.0
pool_table(7804)=1.0
locker(7805)=1.0

    9466    0.20320855614973263 9467    0.10160427807486631 9505    0.0427807486631016  9514    0.016042780748663103    9468    0.053475935828877004    9462    0.13903743315508021 9463    0.13368983957219252 9460    0.22994652406417113 9486    0.0374331550802139  9506    0.016042780748663103

27210236
    information(41)=1.0
working(205)=1.0
plan(346)=1.0
business(369)=1.0
office(379)=1.0
entrepreneur(382)=1.0
corporate(390)=1.0
businessman(395)=1.0
meeting(520)=1.0
cooperation(524)=1.0
communication(560)=1.0
discussion(600)=1.0
collaboration(718)=1.0
interaction(734)=1.0
workplace(737)=1.0
busy(746)=1.0
planning(759)=1.0
strategy(760)=1.0
talking(781)=1.0
solution(1021)=1.0
research(1081)=1.0
business_people(1222)=1.0
thinking(1487)=1.0
development(1596)=1.0
analysis(1797)=1.0
mission(1887)=1.0
global_business(2246)=1.0
office_worker(2274)=1.0
vision(2313)=1.0
tactics(2586)=1.0
investment(3004)=1.0
marketing(3197)=1.0
organization(3801)=1.0
motivation(3802)=1.0
operations(4660)=1.0
white_collar_worker(4840)=1.0
process(4983)=1.0
corporate_business(5542)=1.0
workspace(5706)=1.0
enterpriser(6814)=1.0
strategic(7806)=1.0
statistics(7807)=1.0
global_finance(7808)=1.0
objective(7809)=1.0
global_market(7810)=1.0

    9466    0.20320855614973263 9467    0.10160427807486631 9505    0.0427807486631016  9514    0.016042780748663103    9468    0.053475935828877004    9462    0.13903743315508021 9463    0.13368983957219252 9460    0.22994652406417113 9486    0.0374331550802139  9506    0.016042780748663103

27344048
    man(195)=1.0
working(205)=1.0
plan(346)=1.0
business(369)=1.0
office(379)=1.0
businessman(395)=1.0
business_person(507)=1.0
place_of_work(510)=1.0
laptop(512)=1.0
computer(517)=1.0
communication(560)=1.0
workplace(737)=1.0
planning(759)=1.0
strategy(760)=1.0
talking(781)=1.0
window(924)=1.0
research(1081)=1.0
technology(1395)=1.0
wireless(1461)=1.0
connection(1466)=1.0
thinking(1487)=1.0
online(1804)=1.0
internet(1805)=1.0
alone(1978)=1.0
mobile_phone(2284)=1.0
management(2985)=1.0
startup(3699)=1.0
digital_device(4572)=1.0
white_collar_worker(4840)=1.0
brick(5553)=1.0
workspace(5706)=1.0
telecommunication(7883)=1.0
brick_wall(7884)=1.0
on_the_phone(7885)=1.0

    9466    0.20320855614973263 9467    0.10160427807486631 9505    0.0427807486631016  9514    0.016042780748663103    9468    0.053475935828877004    9462    0.13903743315508021 9463    0.13368983957219252 9460    0.22994652406417113 9486    0.0374331550802139  9506    0.016042780748663103



    9480    0.03642671292281006 9496    0.033824804856895055    9481    0.03469210754553339 9499    0.03555941023417172 9491    0.03295750216825672 9517    0.03816131830008673 9501    0.03469210754553339 9492    0.03469210754553339 9478    0.03469210754553339 9506    0.03469210754553339

I will also include links to my training data and classifier file, just in case I made a mistake with parsing training data or training process:

Any help would be much appreciated.

Long Le Minh
  • 335
  • 1
  • 2
  • 12
  • Have you solved the problem with Mallet output same result? I am reading your question and find it is similar to mine at https://stackoverflow.com/questions/49649946/why-mallet-text-classification-output-the-same-value-1-0-for-all-test-files .Will you help solve mine? thank you very much. – Dylan Apr 04 '18 at 14:42
  • @Dylan No, I haven't sorry :( – Long Le Minh Apr 05 '18 at 03:40

0 Answers0