I am reading a 77MB file inside a Servlet, in future this will be 150GB. This file is not written using any kind of nio
package thing, it is just written using BufferedWriter
.
Now this is what I need to do.
Read the file line by line. Each line is a "hash code" of a text. Separate it into pieces of 3 chars (3 chars represents 1 word) It could be long, it could be short, I don't know.
After reading the line, convert it into real words. We have a Map of words and Hashes so we can find the words.
Up to now, I used BufferedReader
to read the file. It is slow and not good for Huge files like 150GB. It took hours to complete the entire process even for this 77MB file. Because we can't keep the user waiting for hours, it should be within seconds. So, we decided to load the file into the memory. First we thought about loadng every single line into a LinkedList, so the memory coulkd save it. But you know, memory cannot save such a big amount. After a Big Search, I decided Mapping Files to the memory would be the answer. Memory is super faster than the Disk, so we could read the files super fast too.
Code:
public class MapRead {
public MapRead()
{
try {
File file = new File("E:/Amazon HashFile/Hash.txt");
FileChannel c = new RandomAccessFile(file,"r").getChannel();
MappedByteBuffer buffer = c.map(FileChannel.MapMode.READ_ONLY, 0,c.size()).load();
for(int i=0;i<buffer.limit();i++)
{
System.out.println((char)buffer.get());
}
System.out.println(buffer.isLoaded());
System.out.println(buffer.capacity());
} catch (IOException ex) {
Logger.getLogger(MapRead.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
But I could not see any "super fast" thing. And I need line by line. I have few questions to ask.
You read my description and you know what I need to do. I have done the first step for that, so is that correct?
The way I Map is correct? I mean, this is no difference than reading it in normal way. So does this hold the "entire" file in memory first? (lets say using a technique called
Mapping
) Then we have to write another code to access that memory?How to read line by line, in super "fast" ? (If I have to load/map the entire file to the memory first for hours, then access it in super speed in seconds, I am totally fine with it too)
Reading files in Servlets is good ? (Because it is being accessed by number of people, and only one IO stream will be opened at once. In this case this servlet will be accessed by thousands at once)
Update
This is how my code look when I updated it with SO user Luiggi Mendoza's answer.
public class BigFileProcessor implements Runnable {
private final BlockingQueue<String> linesToProcess;
public BigFileProcessor (BlockingQueue<String> linesToProcess) {
this.linesToProcess = linesToProcess;
}
@Override
public void run() {
String line = "";
try {
while ( (line = linesToProcess.take()) != null) {
System.out.println(line); //This is not happening
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public class BigFileReader implements Runnable {
private final String fileName;
int a = 0;
private final BlockingQueue<String> linesRead;
public BigFileReader(String fileName, BlockingQueue<String> linesRead) {
this.fileName = fileName;
this.linesRead = linesRead;
}
@Override
public void run() {
try {
//Scanner do not work. I had to use BufferedReader
BufferedReader br = new BufferedReader(new FileReader(new File("E:/Amazon HashFile/Hash.txt")));
String str = "";
while((str=br.readLine())!=null)
{
// System.out.println(a);
a++;
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
public class BigFileWholeProcessor {
private static final int NUMBER_OF_THREADS = 2;
public void processFile(String fileName) {
BlockingQueue<String> fileContent = new LinkedBlockingQueue<String>();
BigFileReader bigFileReader = new BigFileReader(fileName, fileContent);
BigFileProcessor bigFileProcessor = new BigFileProcessor(fileContent);
ExecutorService es = Executors.newFixedThreadPool(NUMBER_OF_THREADS);
es.execute(bigFileReader);
es.execute(bigFileProcessor);
es.shutdown();
}
}
public class Main {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
BigFileWholeProcessor b = new BigFileWholeProcessor ();
b.processFile("E:/Amazon HashFile/Hash.txt");
}
}
I am trying to print the file in BigFileProcessor
. What I understood is this;
User enter file name
That file get read by
BigFileReader
, line by lineAfter each line, the
BigFileProcessor
get called. Which means, assumeBigFileReader
read the first line. Now theBigFileProcessor
is called. Now once theBigFileProcessor
completes the processing for that line, now theBigFileReader
reads the line 2. Then again theBigFileProcessor
get called for that line, and so on.
May be my understanding about this code is incorrect. How should I process the line anyway?