1

Here is the main code:

import java.io.*;
import java.util.*;

public class Main {
    public static void main(String[] args) throws IOException {

        RandomAccessFile raf = new RandomAccessFile("test.dat", "rwd");
        Map<String, String> dictionary = new LinkedHashMap<>();
        Map<String, IndexRecord> mapThatWritesToFile = new LinkedHashMap<>();
        Map<String, IndexRecord> mapThatReadsFromFile = new LinkedHashMap<>();
//        dictionary.put("Test", "test");
        dictionary.put("Java", "Best programming language");
        dictionary.put("T-SQL", "Database language");
        dictionary.put("None", "Doesn't matter");

        int pointerToDictionaryData = Integer.BYTES; // I add 4 bytes, because this number is going to be the first data written to file.
        int tempPointer = pointerToDictionaryData;
        for(String part : dictionary.keySet()){
            pointerToDictionaryData += part.getBytes().length + (2*Integer.BYTES); // adding bytes from String + 2x Integer.Bytes for startByte and length of data.
        }
        raf.writeInt(pointerToDictionaryData);

        int pointerToIndex = (int) raf.getFilePointer();
        raf.seek(pointerToDictionaryData);
        for(String key : dictionary.keySet()){
            StringBuilder sb = new StringBuilder(key);
            sb.append("=");
            sb.append(dictionary.get(key));
            raf.writeUTF(sb.toString());

            IndexRecord record = new IndexRecord(tempPointer, (int) (raf.getFilePointer() - tempPointer));
            mapThatWritesToFile.put(key, record);
            tempPointer = (int) raf.getFilePointer();
        }

        raf.seek(pointerToIndex);
        for(String key : mapThatWritesToFile.keySet()){
           raf.writeUTF(key);
           raf.writeInt(mapThatWritesToFile.get(key).getStart());
           raf.writeInt(mapThatWritesToFile.get(key).getLength());
        }

        raf.seek(0);
        pointerToDictionaryData = raf.readInt();

        while(raf.getFilePointer() < pointerToDictionaryData){
            String key = raf.readUTF();
            int start = raf.readInt();
            int length = raf.readInt();
            mapThatReadsFromFile.put(key, new IndexRecord(start, length));
        }

        for(String key : mapThatReadsFromFile.keySet()){
            System.out.println("Reading: " + key);
            System.out.println(key + " | starts at byte: " + mapThatReadsFromFile.get(key).getStart() +
                    " and is " + mapThatReadsFromFile.get(key).getLength() + " bytes long");
            raf.seek(mapThatReadsFromFile.get(key).getStart());
            System.out.println("Result: " + raf.readUTF() + "\n");
        }
    }
}

Here is also IndexRecord class which is necessary:

public class IndexRecord {

    private int start;
    private int length;

    public IndexRecord(int start, int length) {
        this.start = start;
        this.length = length;
    }

    public int getStart() {
        return start;
    }

    public int getLength() {
        return length;
    }
}

So what's the issue exactly? Maybe I will start with the purpose of this program. I am trying to learn how to use RandomAccessFile class and I created a code that writes three sections into a file.

The first one is a pointer to part of the file at which dictionary elements are stored (in format key=description). After that, I store data about index - which is a LinkedHashMap which keys are the same keys used in dictionary and its values are IndexRecord instances which holds two Integer values: first that holds a pointer to a byte at which data about this key starts and the second one is length of that information (in bytes aswell). The last section of the file is storing information about dictionary.

So the point of this program is that I don't want to load every single dictionary element into memory. I only want to load index and whenever I need a descrpition for dictionary key, I load the desired data from file using an appropirate index record.

Okay - so what's the problem? The problem is that, there is always something wrong with either reading or writing the first element of the dictionary. If you will run the program, you will see that value for "Java" has wrong length and it also doesn't read the value for this key. If you uncomment "Test, Test" input to dictionary, you will se that value for Java is now read correctly and the length is okay, but then again the data for "Test" key is bugged.

The result with commented "Test" instance in dictionary:

Reading: Java
Java | starts at byte: 4 and is 69 bytes long
Result: Java

Reading: T-SQL
T-SQL | starts at byte: 73 and is 25 bytes long
Result: T-SQL=Database language

Reading: None
None | starts at byte: 98 and is 21 bytes long
Result: None=Doesn't matter

vs. the result with uncommented "Test" instance in dictionary:

Reading: Test
Test | starts at byte: 4 and is 60 bytes long
Result: Test

Reading: Java
Java | starts at byte: 64 and is 32 bytes long
Result: Java=Best programming language

Reading: T-SQL
T-SQL | starts at byte: 96 and is 25 bytes long
Result: T-SQL=Database language

Reading: None
None | starts at byte: 121 and is 21 bytes long
Result: None=Doesn't matter

So as you can see - there is always something wrong with length of the first element of dictionary and it's not read properly (or before, written properly).

I am sorry if the question is too big, but I wanted to include every information that matters so you can get a fine idea about what the program is, what it does and what's the problem.

PS: I know some "solution" could be adding an element into the dictionary as the first element that would be not accessible to read, but I don't want to do that. I would like to understand what's wrong with the code so I can understand the class better.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
MrJ_
  • 55
  • 7
  • 1
    One issue is that you assume that `part.getBytes().length` gives you the bytes amount of bytes written by [`writeUTF`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/RandomAccessFile.html#writeUTF(java.lang.String)), but that's not true for two reasons: 1. `writeUTF` prefixes the actual data with the amount of bytes written in short (i.e. 2 bytes) and 2. `part.getBytes()` doesn't necessarily use the UTF-8 encoding, which `writeUTF` uses and thus could encode to different lengths (this second one is unlikely to affect your current code, due to only using ASCII-chars). – Joachim Sauer Jul 27 '21 at 09:52
  • Also: just for clarity and so that you don't accidentally use knowledge from writing when reading, I'd suggest you split the code into two methods: one which writes the file and another that reads the file. This way you can ensure that you only pass information between the two that you *want* to pass. – Joachim Sauer Jul 27 '21 at 09:53
  • Hey @JoachimSauer, yeah I would normally seperate reading from writing, but in this case I wanted to create minimal reproducible example so it is as clear as it can be. Also thank you for your explanation, I changed keys from String to Integer, got rid of String.getBytes().length method and it is now working properly. I wonder if there is a way to make it "dynamic" as I tried before. Like you know, when you have objects that not always contain the same amount of bytes. – MrJ_ Jul 27 '21 at 10:22
  • A minimal reproducible example can still contain multiple methods. In fact it's usually more readable that way. The "minimal" doesn't necessarily mean "the minimum amount of characters possible", as that would lead to silly things like replacing variable names with 1-character-names. It's just means to not do stuff that's unrelated to the issue at hand. And yes: you obviously could make your code work as intended, you just have to keep in mind intricacies of `writeUTF` like the ones I mentioned. In other words: pay attention to the details. – Joachim Sauer Jul 27 '21 at 10:34

0 Answers0