I am making a simple program where I am using a sample of PDF files to build a full text indexing on my database. The idea is I read each PDF file, extract the words and store them in a hashset.
Then, add each word in a loop to the table in MySQL along with it's file path. So, each word is looped through to be stored in each column until it finishes. It works perfectly fine. However , when it comes to large PDF files which contains thousands and thousands of words, it might take some time to build the index table.In other words, it takes long time to save each word to the database as extraction of words is fast.
Code:
public class IndexTest {
public static void main(String[] args) throws Exception {
// write your code here
//String path ="D:\\Full Text Indexing\\testIndex\\bell2009a.pdf";
// HashSet<String> uniqueWords = new HashSet<>();
/*StopWatch stopwatch = new StopWatch();
stopwatch.start();*/
File folder = new File("D:\\PDF1");
File[] listOfFiles = folder.listFiles();
for (File file : listOfFiles) {
if (file.isFile()) {
HashSet<String> uniqueWords = new HashSet<>();
String path = "D:\\PDF1\\" + file.getName();
try (PDDocument document = PDDocument.load(new File(path))) {
if (!document.isEncrypted()) {
PDFTextStripper tStripper = new PDFTextStripper();
String pdfFileInText = tStripper.getText(document);
String lines[] = pdfFileInText.split("\\r?\\n");
for (String line : lines) {
String[] words = line.split(" ");
for (String word : words) {
uniqueWords.add(word);
}
}
// System.out.println(uniqueWords);
}
} catch (IOException e) {
System.err.println("Exception while trying to read pdf document - " + e);
}
Object[] words = uniqueWords.toArray();
String unique = uniqueWords.toString();
// System.out.println(words[1].toString());
for(int i = 1 ; i <= words.length - 1 ; i++ ) {
MysqlAccessIndex connection = new MysqlAccessIndex();
connection.readDataBase(path, words[i].toString());
}
System.out.println("Completed");
}
}
SQL connection code:
public class MysqlAccessIndex {
public MysqlAccessIndex() throws Exception {
Class.forName("com.mysql.jdbc.Driver");
connect = DriverManager
.getConnection("jdbc:mysql://126.32.3.178/fulltext_ltat?"
+ "user=root&password=root123");
// statement = connect.createStatement();
System.out.print("Connected");
}
public void readDataBase(String path,String word) throws Exception {
try {
statement = connect.createStatement();
System.out.print("Connected");
preparedStatement = connect
.prepareStatement("insert IGNORE into fulltext_ltat.test_text values (?, ?) ");
preparedStatement.setString(1, path);
preparedStatement.setString(2, word);
preparedStatement.executeUpdate();
// resultSet = statement
//.executeQuery("select * from fulltext_ltat.index_detail");
// writeResultSet(resultSet);
} catch (Exception e) {
throw e;
} finally {
close();
}
}
Is there any suggestion to improve or optimize the performance issue?