0

I'm having a tough time figuring out the issue with Serialization in hadoop. Here's my class -

    public class CrawlerTweet  implements Serializable, Writable    {

    private static final long serialVersionUID = 1L;
    private String keywords;
    private List<TweetStatus> tweets;
    private long queryTime = 0;

    public CrawlerTweet() {}


        public CrawlerTweet(String keys, List<TweetStatus> tweets, long queryTime){
        this.keywords = keys;
        this.tweets = tweets;
        this.queryTime = queryTime;
    }

    public static CrawlerTweet read(DataInput in) throws IOException {
        CrawlerTweet ts= new CrawlerTweet();
        ts.readFields(in);
        return ts;
      }

    @Override
    public void readFields(DataInput din) throws IOException {
        queryTime = din.readLong();
        keywords = din.readLine();

        tweets.clear();
        IntWritable size = new IntWritable();
        size.readFields(din);
        int n = size.get();
        while(n -- > 0) {
            TweetStatus ts = new TweetStatus();
            ts.readFields(din);
            tweets.add(ts);
        }
    }

    @Override
    public void write(DataOutput dout) throws IOException {
        dout.writeChars(keywords);
        dout.writeLong(queryTime);
        IntWritable size = new IntWritable(tweets.size());
        size.write(dout);
        for (TweetStatus  ts  : tweets)
           ts.write(dout);
    }

    public String getKeywords(){
        return keywords;
    }

    public List<TweetStatus> getTweets(){
        return tweets;
    }

    public long getQueryTime(){
        return queryTime;
    }
}

If I implment both Serizable n WRitable interfaces, I get following exception,

java.lang.ClassNotFoundException: mydat.twitter.dto.CrawlerTweet
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:601)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1572)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1493)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1729)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at focusedCrawler.util.storage.socket.ServerConnectionHandler.buildRequestObject(ServerConnectionHandler.java:136)
at focusedCrawler.util.storage.socket.ServerConnectionHandler.run(ServerConnectionHandler.java:340)

And, if I implment Writable only, I get NotSerializableException -

Erro de comunicacao: bigdat.twitter.dto.CrawlerTweet
Dormindo 5 mls
`[21/JUN/2013:11:23:39] [SocketAdapterFactory] [produce] [hadoop22:3190]
Erro de comunicacao: bigdat.twitter.dto.CrawlerTweet
   java.io.NotSerializableException: bigdat.twitter.dto.CrawlerTweet
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1164)
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:330)
    at focusedCrawler.util.storage.socket.StorageRemoteAdapter.serializeParamObject(StorageRemoteAdapter.java:113)
    at focusedCrawler.util.storage.socket.StorageRemoteAdapter.defaultMethod(StorageRemoteAdapter.java:205)
    at focusedCrawler.util.storage.socket.StorageRemoteAdapter.insert(StorageRemoteAdapter.java:289)
    at focusedCrawler.util.storage.distribution.StorageRemoteAdapterReconnect.insert(StorageRemoteAdapterReconnect.java:213)
    at bigdat.twitter.crawler.CrawlTwitter.download(Unknown Source)
    at bigdat.twitter.crawler.CrawlTwitter.run(Unknown Source)

Further information extracted from comments:

CrawlerTweet is packaged in BDAnalytics16.jar file.

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/rgupta/bdAnalytics/lib/* 
hadoop jar $jarpath/BDAnalytics16.jar bigdat.twitter.crawler.CrawlTwitter \
       $crwlInputFile > $logsFldr/crawler_$1.log 2>&1 &

Help will be much appreciated! Thx!

Chris White
  • 29,949
  • 4
  • 71
  • 93
user2508012
  • 43
  • 1
  • 3
  • 5
  • How are you invoking this class? – SSaikia_JtheRocker Jun 21 '13 at 19:13
  • Hi ...Here's the command I use. CrawlerTweet is packaged in BDAnalytics16 JAR file. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/rgupta/bdAnalytics/lib/* hadoop jar $jarpath/BDAnalytics16.jar bigdat.twitter.crawler.CrawlTwitter $crwlInputFile > $logsFldr/crawler_$1.log 2>&1 & Thanks! – user2508012 Jun 21 '13 at 20:06
  • mydat.twitter.dto.CrawlerTweet & bigdat.twitter.crawler.CrawlTwitter - in both, packages are intentionally supposed to be different, right? – SSaikia_JtheRocker Jun 22 '13 at 19:16

0 Answers0