I have an Cp1252 file that I want to read as binary.
ls -al
from the terminal shows its size is 10 bytes.
This java snippet however reports 18 bytes:
Path path = Paths.get(lfile);
SeekableByteChannel sbc = Files.newByteChannel(path, StandardOpenOption.READ);
long size = sbc.size();
The file contains 6 ascii character + 4 Cp1252 characters. My understanding is that 10 bytes is the correct size of this file on the file system. One more detail: when trying to read the content of the file using:
byte[] fileContents = Files.readAllBytes(path);
I get 18 bytes, as each Cp1252 char is loaded as 3 bytes. In file I have different Cp1252 chars, buffer shows them all as being the same - which is incorrect for sure.
Two questions bother me:
How many bytes does this file actually take on a file system.
Presuming that it is 10 bytes long, how to read it as "raw"
Update: I tried the same using a small C program and results are as expected: 10 characters are read from the file and 4 of them that are Cp1252 are all of different value.
int main() {
char fileName[200] = "test.x10";
FILE *fp = fopen(fileName, "r");
while(1) {
int c = fgetc(fp);
if( feof(fp) )
break ;
printf("%i ", c);
}
fclose(fp);
}
Update 2:
test.x10 contains Cp1252 characters: aöaäaüaßbb
C code given above prints out: 97 246 97 228 97 252 97 223 98 98
Files.readAllBytes reads: 97 239 191 189 97 239 191 189 97 239 191 189 97 239 191 189 98 98
Here is the hexdump:
hexdump -C test.x10
00000000 61 f6 61 e4 61 fc 61 df 62 62 |a.a.a.a.bb|