0

So I use the following methods

(File is converted to Byte Array through 'convertFileToByteArray()', then written to .txt file by 'convertByteArrayToBitTextFile()'

to convert any kind of file into a Binary Text file (and by that I mean only 1's and 0's in human readable form.)

public static byte[] convertFileToByteArray(String path) throws IOException
{
    File file = new File(path);
    byte[] fileData;
    fileData = new byte[(int)file.length()];
    FileInputStream in = new FileInputStream(file);
    in.read(fileData);
    in.close();
    return fileData;
}

public static boolean convertByteArrayToBitTextFile(String path, byte[] bytes)
{
    String content = convertByteArrayToBitString(bytes);
    try
    {
        PrintWriter out = new PrintWriter(path);
        out.println(content);
        out.close();
        return true;
    }
    catch (FileNotFoundException e)
    {
        return false;
    }
}

public static String convertByteArrayToBitString(byte[] bytes)
{
    String content = "";
    for (int i = 0; i < bytes.length; i++)
    {
        content += String.format("%8s", Integer.toBinaryString(bytes[i] & 0xFF)).replace(' ', '0');
    }
    return content;
}

Edit: Additional Code:

public static byte[] convertFileToByteArray(String path) throws IOException
{
    File file = new File(path);
    byte[] fileData;
    fileData = new byte[(int)file.length()];
    FileInputStream in = new FileInputStream(file);
    in.read(fileData);
    in.close();
    return fileData;
}

public static boolean convertByteArrayToBitTextFile(String path, byte[] bytes)
{
    try
    {
        PrintWriter out = new PrintWriter(path);
        for (int i = 0; i < bytes.length; i++)
        {
            out.print(String.format("%8s", Integer.toBinaryString(bytes[i] & 0xFF)).replace(' ', '0'));
        }
        out.close();
        return true;
    }
    catch (FileNotFoundException e)
    {
        return false;
    }
}

public static boolean convertByteArrayToByteTextFile(String path, byte[] bytes)
{
    try
    {
        PrintWriter out = new PrintWriter(path);
        for(int i = 0; i < bytes.length; i++)
        {
            out.print(bytes[i]);
        }
        out.close();
        return true;
    }
    catch (FileNotFoundException e)
    {
        return false;
    }
}

public static boolean convertByteArrayToRegularFile(String path, byte[] bytes)
{
    try
    {
        PrintWriter out = new PrintWriter(path);
        for(int i = 0; i < bytes.length; i++)
        {
            out.write(bytes[i]);
        }
        out.close();
        return true;
    }
    catch (FileNotFoundException e)
    {
        return false;
    }
}

public static boolean convertBitFileToByteTextFile(String path) 
{
    try
    {
        byte[] b = convertFileToByteArray(path);
        convertByteArrayToByteTextFile(path, b);
        return true;
    }
    catch (IOException e)
    {
        return false;
    }

}

I do this to try methods of compression on a very fundamental level, so please let's not discuss why use human-readable form.

Now this works quite well so far, however I got two problems.

1) It takes foreeeever (>20 Minutes for 230KB into binary text). Is this just a by-product of the relatively complicated conversion or are there other methods to do this faster?

2) and main problem: I have no idea how to convert the files back to what they used to be. Renaming from .txt to .exe does not work (not too surprising as the resulting file is two times larger than the original)

Is this still possible or did I lose Information about what the file is supposed to represent by converting it to a human-readable text file? If so, do you know any alternative that prevents this?

Any help is appreciated.

Jonas Bartkowski
  • 357
  • 1
  • 6
  • 15

2 Answers2

3

The thing that'll cost you most time is the construction of an ever increasing String. A better approach would be to write the data as soon as you have it.

The other problem is very easy. You know that every sequence of eight characters ('0' or '1') was made from a byte. Hence, you know the values of each character in an 8-character block:

01001010
       ^----- 0*1
      ^------ 1*2
     ^------- 0*4
    ^-------- 1*8
   ^--------- 0*16
  ^---------- 0*32
 ^----------- 1*64
^------------ 0*128
              -----
              64+8+2 = 74

You only need to add the values where an '1' is present.

You can do it in Java like this, without even knowing the individual bit values:

String sbyte = "01001010";
int bytevalue = 0;
for (i=0; i<8; i++) {
    bytevalue *= 2;           // shifts the bit pattern to the left 1 position
    if (sbyte.charAt(i) == '1') bytevalue += 1;
}
Ingo
  • 36,037
  • 5
  • 53
  • 100
  • Wow, that cut it down to 2 Seconds. Thank you so much! – Jonas Bartkowski Dec 08 '13 at 22:57
  • Regarding 2): Thanks for the elaborate explanation! However, if I change an executable into the bit-text and then reverse the process (bits to bytes) using your method and save it as oldname.exe, it still is not executable (bascially just contains bytes instead of bits in cleantext). Any ideas? – Jonas Bartkowski Dec 08 '13 at 23:20
  • @JonasBartkowski Probably you use some print method? You should use the write(int) method of some FileOutputStream. – Ingo Dec 08 '13 at 23:25
  • Thanks! But still nothing :/ I currently use FileInputStream in.read((int) File.length) to read the bytes (byte[]) from a file and then just iterate through that Array and use FileOutputStream out.write(byte[i]) to write to the new File. But somehow this changes nothing. (Still clean text bytes and double size) – Jonas Bartkowski Dec 08 '13 at 23:45
  • The double size makes me think that there must be something else wrong - you should have *exactly* 8-fold size. Check again, and if all else fails post the code for the reconstruction of the original file, I'll look into it tomorrow. – Ingo Dec 08 '13 at 23:48
  • It's actually a long. If I don't cast it to int it says "Incompatible types. Possible lossy Conversion from long to it." Maybe this is a clue? (: Also, a huuge thank you for your help so far! – Jonas Bartkowski Dec 08 '13 at 23:50
  • @JonasBartkowski Makes no sense to me. Where does a `long` come into the game here? Please make sure your text binary file has the 8-fold length of the original and verify that it is correct. As long as this is not so, it makes no sense to care about the reverse conversion. – Ingo Dec 09 '13 at 09:39
  • The thing is, new File(path) 's length attribute is of double length. – Jonas Bartkowski Dec 09 '13 at 17:04
  • @JonasBartkowski Then it is wrong. I suggest you make a simple test file that contains "XYZ\n123\n" and convert it. Then post here the output. It should be 64 charactsers (0 or 1). – Ingo Dec 09 '13 at 17:13
  • okay. The output is:(01011000010110010101101001011100011011100011000100110010001100110101110001101110). 80 Characters. :/ Seems like you were right. Further ideas? You are my hope currently (: – Jonas Bartkowski Dec 09 '13 at 18:20
  • @JonasBartkowski Well, you literally typed \n, but I meant the newline character, of course. Anyway. Now, I need your program for reversing the conversion (please edit your original post, don't put it in comment) *and* the output when applied to that 80 characters above. – Ingo Dec 09 '13 at 22:56
  • @JonasBartkowski As far as I can see, you just print out the same byte sequence you have read from the file? This is not going to work. You need to condense 8 bytes into one, as shown above. – Ingo Dec 10 '13 at 21:35
  • You were right (: I used the wrong type 'PrintWriter' instead of the right one 'FileOutputStream'. Now it works perfectly fine with translating any text file into a binary one and back. Very nice! Thank you so much already (: Now when I try the same with another type of file, it returns an NumberFormatException for e.g. '10010000'. How is it possible, that this produces "wrong" bytes? (The bytes of each file are produced via 'Files.readAllBytes(Paths.get(path))) – Jonas Bartkowski Dec 11 '13 at 20:45
2
  1. Use StringBuilder to avoid generating enormous numbers of unused String instances.
    Better yet, write directly to the PrintWriter instead of building it in-memory at all.

  2. Loop through every 8-character subsequence and call Byte.parseByte(text, 2) to parse it back to a byte.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964