-1

I need to read a file that is in ascii and convert it into hex before applying some functions (search for a specific caracter)

To do this, I read a file, convert it in hex and write into a new file. Then I open my new hex file and I apply my functions.

My issue is that it makes way too much time to read and convert it (approx 8sec for a 9Mb file)

My reading method is :

public static void convertToHex2(PrintStream out, File file) throws IOException {
    BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file)); 
    int value = 0;

    StringBuilder sbHex = new StringBuilder();
    StringBuilder sbResult = new StringBuilder();

    while ((value = bis.read()) != -1) {   
        sbHex.append(String.format("%02X ", value));            
       }
        sbResult.append(sbHex); 
        out.print(sbResult);
        bis.close();
}

Do you have any suggestions to make it faster ?

tmylamoule
  • 161
  • 1
  • 12
  • Out of interest, why on earth do you need to convert it to hexadecimal format? – Alex K. May 29 '15 at 15:14
  • Why do you have to write the converted data to a file? Even if your search functions insist on hex (which is bound to cause problems if your search for BC in a hex sequence of 89ABCD) it would not be necessary to have it on a file. I/O time is punishing. – laune May 29 '15 at 15:16
  • Your code will leak resources if there is an I/O exception. – Raedwald May 29 '15 at 16:08
  • @AlexK. I need to find some pattern in hex. In ascii some character are not shown and transformed into a dot "." – tmylamoule May 30 '15 at 18:06

2 Answers2

0

Did you measure what your actual bottleneck is? Because you seem to read very little amount of data in your loop and process that each time. You might as well read larger chunks of data and process those, e.g. using DataInputStream or whatever. That way you would benefit more from optimized reads of your OS, file system, their caches etc.

Additionally, you fill sbHex and append that to sbResult, to print that somewhere. Looks like an unnecessary copy to me, because sbResult will always be empty in your case and with sbHex you already have a StringBuilder for your PrintStream.

Thorsten Schöning
  • 3,501
  • 2
  • 25
  • 46
  • In fact, I have a pattern in my file if I convert it in hex : - 4 bytes for length - 4 bytes of code (like HELO, GOOD..) - n bytes of data (depending on length) - 4 bytes of checksum. For instance, i have 0042 for length. My data will be 66 bytes. In that pattern, I want, for instance, to get only the data if the 4 bytes code are "HELO" (so 48 45 4c 4f in hex) – tmylamoule May 30 '15 at 18:13
  • That doesn't answer any of my questions... Just because you currently want to only read 4 bytes or a comparable amount of few bytes, you shouldn't ask the OS for such few data. You should always read large chunks of data and process them in memory, the only question is if those chunks are e.g. 4 kB, MB or GB... At least 4 kB is often a good choice, because that is a common sector size for most file systems. No filesystem reads only exactly 4 bytes or such. – Thorsten Schöning May 30 '15 at 22:13
  • You tell me I need to have a bigger buffer ? I will find out how to do it with bufferedreader. My file is very similar to .png – tmylamoule Jun 01 '15 at 07:36
  • Do you recommend DataInputStream or BufferedReader ? – tmylamoule Jun 01 '15 at 07:42
  • Both are for different things, BufferedReader is for textual input, which you don't seem to have and want. Just look at the signature of DataInputStream and which method it provides, it already has methods to return native Java types. https://docs.oracle.com/javase/7/docs/api/java/io/DataInputStream.html – Thorsten Schöning Jun 01 '15 at 12:04
0

Try this:

static String[] xx = new String[256];
static {
    for( int i = 0; i < 256; ++i ){
        xx[i] = String.format("%02X ", i);
    }
}

and use it:

sbHex.append(xx[value]);

Formatting is a heavy operation: it does not only the coversion - it also has to look at the format string.

laune
  • 31,114
  • 3
  • 29
  • 42