So I have been trying to work this out for a while but unable to come to a "rapid" solution. I do have a solution in place but it takes literaly 3 days for it to complete but unfortunatly that is far too long.
- What i am trying to do:
So I have a text file (call this 1.txt) that contains unique time stamps, and have a secondary text file (call this 2.txt) that contains mixed data, and the intention is to read the first time stamp from 1.txt and find the match in 2.txt and output it in a new file, and continually do this. There are approximately 100,000 time stamps in 1.txt and over 11million lines in 2.txt.
- What i have achieved:
So far what I got is it gets the first time stamp, and have a nested loop where by it loops through the 11 million lines to find a match. Once match is found, itll store that in a variable, up until it moves onto the next timestamp, where it writes out that data. Solution below:
public class fedOrganiser5 {
private static String directory = "C:\\Users\\xxx\\Desktop\\Files\\";
private static String file = "combined.txt";
private static Integer fileNo = 1;
public static void main(String[] args) throws IOException {
String sCurrentLine = "";
int i = 1;
String mapperValue = "";
String outputFirst = "";
String outputSecond = "";
String outputThird = "";
long timer;
int counter = 0;
int test = 0;
timer = System.currentTimeMillis();
try {
BufferedReader reader = new BufferedReader(new FileReader(directory + "newfile" + fileNo + ".txt"));
BufferedWriter writer = new BufferedWriter(new FileWriter(directory + "final_" + fileNo + ".txt"));
BufferedReader mapper = new BufferedReader(new FileReader(directory + file));
for (sCurrentLine = reader.readLine(); sCurrentLine != null; sCurrentLine = reader.readLine()) {
if (!sCurrentLine.trim().isEmpty() && sCurrentLine.trim().length() > 2) {
sCurrentLine = sCurrentLine.replace(" ", "").replace(", ", "").replace(",", "").replace("[", "");
try {
if (counter>0) {
writer.write(outputFirst + outputSecond + outputThird);
outputFirst = "";
outputSecond = "";
outputThird = "";
counter = 0;
test=0;
i++;
mapper.close();
mapper = new BufferedReader(new FileReader(directory + file));
System.out.println("Writing out details for " + sCurrentLine);
}
for (mapperValue = mapper.readLine(); mapperValue != null; mapperValue = mapper.readLine()) {
test++;
System.out.println("Find match " + i + " - " + test);
if (mapperValue.contains(sCurrentLine)) {
System.out.println("Match found - Mapping " + sCurrentLine + i);
if (mapperValue.contains("[EVENT=agentStateEvent]")) {
outputFirst += mapperValue.trim() + "\r\n";
counter++;
} else if (mapperValue.contains("[EVENT=TerminalConnectionCreated]")) {
outputSecond += mapperValue.trim() + "\r\n";
counter++;
} else {
outputThird += mapperValue.trim() + "\r\n";
counter++;
}
}
}
}
catch (Exception e)
{
System.err.println("Error: "+sCurrentLine + " " + mapperValue);
}
}
}
System.out.println("writing final record out");
writer.write(outputFirst + outputSecond + outputThird);
writer.close();
System.out.println("complete!");
System.out.print("Time taken: " +
((TimeUnit.MILLISECONDS.toMinutes(System.currentTimeMillis())-TimeUnit.MILLISECONDS.toMinutes(timer)))
+ " minutes");
}
catch (Exception e)
{
System.err.println("Error: Target File Cannot Be Read");
}
}
}
- The problem?
I have tried looking through other solutions on google and forums but unable to seek a suitable or a faster approach to do this (or its something thats beyond my depth of knowledge). Looping through 11million lines for every time stamp takes approximately 10 minutes, and with 10,000 timestamps, you can imagine how long the process will take. Can someone provide me some friendly advice of where to look or any APIs that can speed this process up?