I think I'm too late for answering this question But I anyone in the future has the same question here the answer
First using Tika to extract the content of any file type
File file = new File("file path");
//parse method parameters
Parser parser = new AutoDetectParser();
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
FileInputStream inputstream = new FileInputStream(file);
ParseContext context = new ParseContext();
//parsing the file
parser.parse(inputstream, handler, metadata, context);
after initializing Gate Gate.init();
Corpus corpus = Factory.newCorpus("SegmenterCorpus");
Document document = Factory.newDocument(handler.toString());// **handler from tika parser to extract the content of a document**
corpus.add(document);
pipeline.setCorpus(corpus);
pipeline.execute();
for more information about how to use Tika
you can see TIKA Tutorial
its very usefull and learn you how to use tika step by step