I am trying to build a compiler in javascript and until now I've managed to build a lexer that creates tokens based on input:
= Test Input (with optional semicolon):
data myVariable = 4
data myVariable2 = "myName";
task Eat {
receives (whatToEat : String, howMuchTime : Float)
print whatToEat
returns (Nothing : Void)
}
= Actual Lexer Result (From Console - The Array Of Tokens):
[0: {content: "data", denominal: "keyword"}
1: {content: "myVariable", denominal: "identifier"}
2: {content: "=", denominal: "operator"}
3: {content: "4", denominal: "number"}
4: {content: "data", denominal: "keyword"}
5: {content: "myVariable2", denominal: "identifier"}
6: {content: "=", denominal: "operator"}
7: {content: ""myName"", denominal: "string"}
8: {content: ";", denominal: "punctuator"}
9: {content: "task", denominal: "keyword"}
10: {content: "Eat", denominal: "identifier"}
11: {content: "{", denominal: "punctuator"}
12: {content: "receives", denominal: "identifier"}
13: {content: "(", denominal: "punctuator"}
14: {content: "whatToEat", denominal: "identifier"}
15: {content: ":", denominal: "punctuator"}
16: {content: "String", denominal: "identifier"}
17: {content: ",", denominal: "punctuator"}
18: {content: "howMuchTime", denominal: "identifier"}
19: {content: ":", denominal: "punctuator"}
20: {content: "Float", denominal: "identifier"}
21: {content: ")", denominal: "punctuator"}
22: {content: "print", denominal: "keyword"}
23: {content: "whatToEat", denominal: "identifier"}
24: {content: "returns", denominal: "keyword"}
25: {content: "(", denominal: "punctuator"}
26: {content: "Nothing", denominal: "identifier"}
27: {content: ":", denominal: "punctuator"}
28: {content: "Void", denominal: "identifier"}
29: {content: ")", denominal: "punctuator"}
30: {content: "}", denominal: "punctuator"}]
the lexer is just doing fine (data and task being keywords for variable and function) BUT i would want to create something regex-like that captures me a function declaration, a variable declaration etc. USING ONLY this current token object as input If it would have been text, I would have captured the function declaration with following regex:
task\s+[a-zA-Z][a-zA-Z0-9]*\s*\{\s*(1)*\s*\}
(1) being regex code for instruction block, including keyword functions as receives etc.
Is there a way to match variable / function declarations, in this case starting from an index that changes during a for loop?
for example:
= I've passed through my token list using a for loop and at index 9 I've found this object for the first time:
9: {content: "task", denominal: "keyword"}
= Now, i want to start searching for a function declaration on the object. This implies:
1) - if the function is correct as declaration, parantheses etc. etc.
2) - how many objects does this function imply - like from index 9 to index 30, all these objects form a function called 'Eat', which has 3 instruction blocks:
1 special receives instruction block, put mandatory at start of the function (even empty), containing arguments as correct format [variableName : variableType]
1 special print instruction block with its parameters given correctly
- 1 special returns instruction block, put mandatory at start of the function (if empty, returning Nothing : Void), containing arguments as correct format [variableName : variableType]
3) - where to stop, so now I know the function definition is over and I can start searching from the final index + 1 = 31 in this case, for other things (ex. variable declarations, EOF etc.)
If you are kind to tell me the method(s) so I can establish the existence of a specific instruction block, creating the upper function description example, it would be awesome!
The ideal result (for this problem) would be an array like this:
Instruction Object:
[0: {
"instruction": "variable_declaration",
"variable_name": "myVariable",
"variable_value": "4",
"variable_type": "Integer"
}
1: {
"instruction": "variable_declaration",
"variable_name": "myVariable2",
"variable_value": ""myName"",
"variable_type": "String"
}
2: {
"instruction": "function_declaration",
"function_name": "Eat",
"body_instructions": [0: {
"instruction": "receives_instruction",
"arguments": [0: {
"argument_name": "whatToEat",
"argument_type": "String"
}
1: {
"argument_name": "howMuchTime",
"argument_type": "Float"
}]
1: {
"instruction": "print_instruction",
"arguments": [0: {
"argument_name": "myVariable",
"argument_value": "4",
"argument_type": "Integer"
}]
2: {
"instruction": "returns_instruction",
"arguments": [0: {
"argument_name": "Nothing",
"argument_value": "",
"argument_type": "Void"
}]
}]
}] // EOF object optional
I appreciate all your help!
Thanks a lot in advance!