1

I am trying to train a MALLET topic model that has been created using import-file, but I am presented with an error stating that MALLET was unable to restore the instance list. Additionally, I experience the same error when loading a completely different model from another relatively large data set. Nonetheless, I am able to use train-topics on a model from a smaller data set. In this instance, the text data is ~20 GB, and the output model is 14 MB. The model is created using:

mallet import-file --input corpus.dat  --output topics.mallet

Here is the error I receive when using train-topics on the model:

Mallet LDA: 10 topics, 4 topic bits, 1111 topic mask
java.io.EOFException
        at java.io.ObjectInputStream$PeekInputStream.readFully(Unknown Source)
        at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(Unknown Source)
        at java.io.ObjectInputStream$BlockDataInputStream.readUTF(Unknown Source)
        at java.io.ObjectInputStream.readString(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at cc.mallet.types.Alphabet.readObject(Alphabet.java:345)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
        at java.io.ObjectInputStream.readSerialData(Unknown Source)
        at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at cc.mallet.types.FeatureVector.readObject(FeatureVector.java:445)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
        at java.io.ObjectInputStream.readSerialData(Unknown Source)
        at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at cc.mallet.types.Instance.readObject(Instance.java:228)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
        at java.io.ObjectInputStream.readSerialData(Unknown Source)
        at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at java.util.ArrayList.readObject(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
        at java.io.ObjectInputStream.readSerialData(Unknown Source)
        at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at cc.mallet.types.InstanceList.load(InstanceList.java:841)
        at cc.mallet.topics.tui.TopicTrainer.main(TopicTrainer.java:199)
Unable to restore instance list topics.mallet: java.lang.IllegalArgumentException: Couldn't read InstanceList from file topics.mallet
mootechs
  • 41
  • 1
  • Thanks for using Mallet! The `import-file` command converts human-readable text files into a more compact Mallet format. This "instance list" is not a topic model yet, it's just a more efficient version of your documents. It would help to see the `train-topics` command you're using. – David Mimno Jan 28 '19 at 14:00
  • 1
    I figured out the issue. It looks like the instance list wasn't completely written to disk due to user space limitations. Anyhow thank you for your help and work on Mallet! – mootechs Jan 29 '19 at 17:19

0 Answers0