-1

When a java file is compiled, it generates a .class file. Now this .class file has the bytecode which the JVM interprets. when we open the .class file in a text editor, it is not human readable. Now to view the bytecode a disassembler like javap can be used.

My question is, why do we need to disassemble bytecode in order to view the bytecode itself?

What does the disassembler actually do, to convert the .class file into human readable format?

Seki
  • 11,135
  • 7
  • 46
  • 70
DesirePRG
  • 6,122
  • 15
  • 69
  • 114
  • 1
    human readable code takes more time to process so it will work slower than machine code/ byte code – user902383 Sep 25 '14 at 09:12
  • 6
    If you're suggesting that the `.class` file format should be a human-readable one, I'd urge you to consider the *vast* majority use-case: the `.class` file is loaded by a machine, not read by a human. Why would you want to sacrifice speed and efficiency for a niche use case? – Jon Skeet Sep 25 '14 at 09:13
  • It would have been great if i could have understood the Chinese language. But for workaround i have hired a translator for this job. And its working for me. Same way, Java needs a translated file to understand and follow the command on machine. – Swaraj Sep 25 '14 at 09:21
  • After all, it is called "byte *code*"... – Seki Sep 25 '14 at 09:44

5 Answers5

5

The Java virtual machine simulates a machine. This is why it is called a machine, despite it being a virtual one that does not exist in hardware. Thus, when thinking about the difference of the javap outout and the actual Java byte code, think about the difference between assembly and machine code:

Assembly code uses so-called mnemonics to make code human readable. Such mnemonic names are however nothing a machine can relate to because a machine only knows how to read and manipulate binary data. Thus, we have to assemble the mnemonic (and its potential arguments) using an assembler where each such mnemonic is translated into its binary equivalent. For example, for loading a value from a specific register we would write something like load 0xFF in assembly instead of using the actual binary opcode for this instruction which might be something like 1001 1011 1111 1111. Similarly, with Java byte code, the mnemonic being what javap produces, we need to represent binary data to the (virtual) machine which it is then is able to process. Only if we want to read the byte code, we rather disassemble it into the assembly code that javap represents.

Keep in mind: The only reason that assembly language and the javap output exists is the fact that humans such as you and me do not enjoy reading binary code. We are trained to distinguish what we see by shapes as for example letters and names. In contrast, a machine interprets data sequentially by reading a stream of bits. As mentioned, these bits are hard for us to read which is why we rather present them in hexadecimal format: Instead of 1111 1111, we rather write 0xFF. But this is still rather difficult to read as such a numeric value does not reveal its contextual meaning. 0xFF could still mean about everything. This is why we rather use the mentioned mnemonics where this meaning is implicit.

You might argue that a virtual machine is still only virtual and this machine could therefore indeed interpret mnemonics rather than binary Java byte code. However, such mnemonics would consume more space (strings are of course just represented as bytes by a machine) and it also take more time to interpret than the simulated machine language that is run on the JVM. You can therefore also think about the byte code being a weird encoding compared to standard encodings such as ASCII where the charset only contains words instead of letters where the words are only those that are used and understood by the Java virtual machine. Obviously, this Java byte code charset is more efficient than using ASCII for describing the contents of a class file.

Rafael Winterhalter
  • 42,759
  • 13
  • 108
  • 192
  • Assemblers also have the advantage of automatically calculating all the offsets and other tedious encoding details. – Antimony Sep 25 '14 at 15:13
  • In an abstract way you could argue that a byte code assembler provides the same convenience by calculating addresses by constant pool offsets. But you are right, it's not pure assembly as a not every byte in a class file is executable. – Rafael Winterhalter Sep 25 '14 at 17:58
3

When it comes to saving data, available formats fall in two large categories:

  • Text formats (such as simple text files, source code files, XML, etc), which have the advantages of being human readable and editable with simple tools, but they can only be parsed by complicated programs (the more complicated the language, the more complicated the program must be to actually understand it).
  • Binary formats (such as most image formats, wave sounds, executables, bytecode files), which have the advantages of being smaller in size for the same amount of information and they don't need a complicated parser to be understood by the machine (often the data is stored in fixed-size chunks, which makes parsing them even easier).

A .class file is primarily intended to be fed to the JVM, so it should be in the smallest and easiest-to-read possible format for the machine. If the .class file was a text file (if the bytecode was saved in its human-readable form), parsing would be required every time the .class file is loaded. However, this feature isn't often needed, so it would be a waste of the application's loading time to do that.

Theodoros Chatzigiannakis
  • 28,773
  • 8
  • 68
  • 104
1

.class is just the object code code which is machine readable. If you want to see the code then you can use any decompiler like Jad Decompiler etc.

Deepu--Java
  • 3,742
  • 3
  • 19
  • 30
1

A class file contains a bunch of commands/opcodes/data intended to be read by the JVM which, when viewed by humans is mostly just a huge bunch of numbers & embedded senseless text.

The reason why you need to disassemble to read this is because the disassembler organizes it in a way that makes sense to humans and substitutes the numbers for their textual commands (e.g. textual versions of the opcodes like aload instead of \19 or goto instead of \A7) which make more sense to humans.

Antimony
  • 37,781
  • 10
  • 100
  • 107
Ross Drew
  • 8,163
  • 2
  • 41
  • 53
0

What the java compiler does is interpret your Java syntax and convert it to statements that the virtual machine understands. This Virtual Machine is written in a combination of C and Java. The Virtual Machine will convert the bytecode instructions to native calls for your operating system. (which is why the JVM for windows is different than the one from unix based systems)

As already stated in a comment interpreting human readable code is slower than interpreting instructions that are already partially native.

Kurt Du Bois
  • 7,550
  • 4
  • 25
  • 33