-1

Me and a friend of mine are thinking about writing our own Programming Language, using just-in-time compilation. We agreed about the Assembly we are going to use but one thing we aren't quite sure on is how to store variables. What we did agree on is the structure of these.

Variables will be replaced by keys (during compilation). Every key is a 2 byte integer ranging from 1 to 65535. When you have for example a variable inside a namespace, the key will exist out of first a 2 byte integer containing the key of the namespace, and than a 2 byte integer containing the key of the actual variable.

So for example if I have namespace foo and I have a variable test in it, and we say that namespace foo will be assigned key 1, and variable test inside 1 will be assigned key 1->1. (First variable in first namespace). In the Assembly itself, we terminate these keys with a NULL byte. (Keep in mind this is the compiled Assembly rather than the real code before compilation)

GETV 1 1 0 SET 5 RET

This Assembly will get variable test out of namespace foo, and set it to 5. It'll then return that variable.

GETV 1 2 1 0 SETV 1 1 0 RET

This assembly could match the following (fictional) code:

foo::testClass::test = foo::test;
return foo::test;

Providing the following structure is given.

namespace foo { // 1 First Global Variable
    byte test = 1; // 1 1 - First Variable Inside First Global Variable
    class testClass { // 1 2 - Second Variable Inside First Global Variable
        static byte test = 0; // 1 2 1 - First Variable Inside Second Variable Inside First Global Variable
    }
}

How would I go about accessing these variables? My current plan was to store them inside a hashmap using the key as string as hash. I don't have any idea how to go about doing this though as how would I know what type of variable is stored in that current key, how long it is and how to do calculations with it. I do understand that preventing mad calculations like adding unsigned integers to signed ones can be handles by the compiler, but that still leaves us with the problem, how long is that variable, and how to handle it. (Adding 2 floats would be handled differently than adding 2 integers, right?)

Seki
  • 11,135
  • 7
  • 46
  • 70
Jeroen
  • 15,257
  • 12
  • 59
  • 102
  • 1
    Why in the world would you want to do this? – Yakk - Adam Nevraumont Aug 23 '13 at 19:04
  • @Yakk Because it's interesting. – Jeroen Aug 23 '13 at 19:06
  • 1
    Writing a language is interesting. Why would you use such a strange method of naming variables? *how* is that naming convention (indexes of some kind) interesting? – Yakk - Adam Nevraumont Aug 23 '13 at 19:12
  • @Yakk I think you got me wrong. The variables, in the actual code will be still using regular names, but after compilation they will be minimized to index keys. – Jeroen Aug 23 '13 at 19:18
  • Wow, such an unintelligible language. Have you studied compiler theory and language theory? Many languages are designed for easy of compilation (LLR) or readability (BASIC). Your's doesn't seem to be either. – Thomas Matthews Aug 23 '13 at 19:27
  • @ThomasMatthews The code snippets are actually compiled snippets aka the Assembly, I didn't think the real programming syntax would matter in this topic. – Jeroen Aug 23 '13 at 19:32
  • @Binero Yes, you have said you are replacing the names with indexes. I repeat my question "why in the world would you want to do that?" I am trying to think of a serious advantage of replacing your variables with indexes, and I am not seeing it. It doesn't seem interesting either. So again, "why in the world would you want to do that?" – Yakk - Adam Nevraumont Aug 23 '13 at 19:42
  • @Yakk Obfuscation, easier to interpret by the Virtual Machine, less meaningless bytes. There basically are no reasons to keep it so why not remove it and get these advantages. – Jeroen Aug 23 '13 at 19:43
  • 1
    Those are reasons to replace your variable names with something else. Those are not reasons to replace your variable names with your chosen strange indexing convention, which obfuscates less, has more meaningless bytes, and is harder to interpret by the Virtual Machine than a plain old uid (be it numeric or not). My advice would be to pick up the purple or red dragon compilers book, and read it, and soak up its advice. – Yakk - Adam Nevraumont Aug 23 '13 at 19:55
  • @The naming shouldn't matter for the question though. The question wasn't how to name the variables, but how to access and modify them. – Jeroen Aug 23 '13 at 19:59
  • 1
    There are so many things wrong with your approach I don't even know where to start. O_o Don't try to reinvent a square wheel, just use a round one like everyone else and map the variable names to their memory address... (or at the very least to a unique, global identifier) Don't forget that namespaces, classes etc are just syntactic sugar for us humans with a limited brain, there's absolutely no need (in fact it's a bad thing) to have a 1-1 mapping between the structure of the code and its implementation once compiled. – syam Aug 23 '13 at 20:48
  • @syam I could use a simple integer but wouldn't that limit the amount of functions, classes, namespaces, variables, etc the program could have? – Jeroen Aug 24 '13 at 11:39
  • 1
    @Binero If you use a `size_t` or a `void*` (which are typically the same size) for your uids, you simply can't hit the limit: you'll always run out of memory first. Now, your current limit of 64k objects per namespace is a whole other matter, it's muuuuch easier to saturate it. – syam Aug 24 '13 at 13:40
  • @syam That does fix the possibly naming problem, but do you know any resources on managing the memory? (Put it in register, read it from register, handle it, etc) – Jeroen Aug 24 '13 at 13:41
  • @Binero Unfortunately I don't know any such resource. Perhaps in the dragon books that Yakk already mentioned, but I never read them. Anyway, those books *are* a good (and almost mandatory) read if you want to implement a compiler so you still should read them, it will help you a lot. – syam Aug 24 '13 at 14:29

1 Answers1

1

The best approach here is not to keep some strange identifiers for your variables but to use direct pointers. Once the program is compiled you will not need human-centric names anymore.

What is more important, you need to think about the structure of your variables. Depending on the syntax of your language, besides of the memory that to keep the value of your variables, you may need some metadata to be stored as well - the type of the variable, for example. This information is needed only if you want to support automatic type casting. If your language is strictly typed, you will be able to resolve all type conflicts in compile time and then you will not need type information in run time.

Also, depending on the syntax, you may need to keep an index that to map the human readable names of the variables to the actual addresses. This index is needed only if your language has functions similar to:

var_by_name(s:string):pointer
johnfound
  • 6,857
  • 4
  • 31
  • 60
  • Not sure what you mean with pointers. I understand the concept pointer from C and C++ but I don't think you are talking about the same kind here. – Jeroen Aug 25 '13 at 10:08
  • Yeah, but how could I use those while compiling? What if a function gets called twice (concurrent)? I'd need to recompile to update the addresses. – Jeroen Aug 25 '13 at 10:45
  • @Binero - there are several ways to allocate memory for the variables. The local variables are usually allocated from the stack. You definitely have to clean your ideas about the memory management of your virtual machine. – johnfound Aug 25 '13 at 10:59
  • SO you are saying I should allocate memory for the functions before they are called? :o – Jeroen Aug 25 '13 at 12:06
  • I still don't quite get this. If I were to use pointers to memory, wouldn't that mean that I would have to set off a section of memory equal to all variables added to each other in the RAM? – Jeroen Aug 28 '13 at 14:31