-2

So I am currently making my own programming language based off of howCode's programming language in Python, but I simply took an hour or so to attempt to convert it into C#, and it went great, although, when I tell the parse to parse the tokens we have collected, it only parses it once after it finds a PRINT STRING in or tokens, and then just stops,

This is the code for my parser, lexer, my script for the laguage, and the console:

Parser:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace BL
{
    public static class Parser
    {
        public static void Parse(string toks)
        {
            if (toks.Substring(0).Split(':')[0] == "PRINT STRING")
            {
                Console.WriteLine(toks.Substring(toks.IndexOf('\"') + 1).Split('\"')[0]);
            }
        }
    }
}

Lexer:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace BL
{
    public static class Lexer
    {
        public static string tok = "";
        public static string str;
        public static int state = 0;
        public static string tokens = "";

        public static void Lex(string data)
        {
            foreach (char c in data)
            {
                tok += c;

                if (tok == " ")
                {
                    if (state == 0)
                    {
                        tok = "";
                        tokens += " ";
                    }
                    else if (state == 1)
                    {
                        tok = " ";
                    }
                }
                else if (tok == Environment.NewLine)
                {
                    tok = "";
                }
                else if (tok == "PRINT")
                {
                    tokens += "PRINT";
                    tok = "";
                }
                else if (tok == "\"")
                {
                    if (state == 0)
                    {
                        state = 1;
                    }
                    else if (state == 1)
                    {
                        tokens += "STRING:" + str + "\" ";
                        str = "";
                        state = 0;
                        tok = "";
                    }
                }
                else if (state == 1)
                {
                    str += tok;
                    tok = "";
                }
            }

            Parser.Parse(tokens);
        }
    }
}

my Script:

PRINT "HELLO WORLD1" PRINT "HELLO WORLD2"

the Console:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace BL
{
    class Program
    {
        static string data;

        static void Main(string[] args)
        {
            Console.Title = "Compiler";
            string input = Console.ReadLine();
            Open(input);

            Lexer.Lex(data);

            Console.ReadLine();
        }

        public static void Open(string file)
        {
            data = File.ReadAllText(file);
        }
    }
}

when I print the contents of tokens (in Lexer) I get this:

PRINT STRING:"HELLO WORLD1" PRINT STRING:"HELLO WORLD2"

although when I parse it, it only prints HELLO WORLD1, not HELLO WORLD1 and underneath it HELLO WORLD2, I'm not sure what I should do to get the other PRINT STRING, an obviously since this was a project only I have created, there is no answer online, thank you in advance.

nhouser9
  • 6,730
  • 3
  • 21
  • 42
One Ace
  • 31
  • 5
  • 4
    i don't want to rain on your parade or enthusiasm, but this is an incredibly weak approach for building language parsers, let alone language translators. You really should go look at a compiler book to see how to do this. That is a big task in its own right. But then, so is learning to be a surgeon. You can't just get out a knife and start cutting. – Ira Baxter Jul 12 '16 at 02:29
  • I know, I was just trying my best to see if I could convert a thing written in Python to C#, the only problem is, is the Parser – One Ace Jul 12 '16 at 02:33
  • Why are your parsing a bunch of command from a string, then concatening them all back to a string? I think it would be easier if your Plex method would returns a list of string. Then you could enumerate through that list in your Parse method. – Kinetic Jul 12 '16 at 02:40
  • Yes, you can probably make this work for a tiny example. The point is that it won't work for anything else, so what have you learned? – Ira Baxter Jul 12 '16 at 02:40
  • @KiNeTiC I was thinking about that too – One Ace Jul 12 '16 at 02:43
  • Ignore the critique re your approach. Everyone has to start somewhere and sophistication is relative. The last thing you want to do is leap straight into say the _[Managed Babel System](https://msdn.microsoft.com/en-us/library/bb165037.aspx); [Managed Package Framework](https://msdn.microsoft.com/en-us/library/bb166360.aspx); GPLex; GPPG;_ or _YACC_. Understand the basics before taking the next step –  Jul 12 '16 at 03:28

1 Answers1

0

You're attempting to parse the language, which is good, but then you're generating a second programming language as a result. This means your Lex() function will end up needing it's own parse logic to handle the resulting text.

This is why most of the time this sort of problem is solved, the Lex() function will create a list of tokens for someone else to consume. Generally these tokens are more than just strings, but for many little languages like can get away with a simple list of strings as tokens.

Since I have a soft spot for toy languages, I've modified your example to follow this flow. It loads the file from user input, then breaks it into individual tokens and uses those tokens to 'run' the program:

// Parse a list of tokens from Lex()
static void Parse(List<string> tokens)
{
    // Run through each token in the list of tokens
    for (int i = 0; i < tokens.Count; i++)
    {
        // And act on the token
        switch (tokens[i])
        {
            case "PRINT":
                // PRINT prints the next token
                // Move to the next token first
                i++;
                // And dump it out
                Console.WriteLine(tokens[i]);
                break;

            default:
                // Anything else is an error, so emit an error
                Console.WriteLine("ERROR: Unknown token " + tokens[i]);
                break;
        }
    }
}

// Parse a source code file, returning a list of tokens
static List<string> Lex(string data)
{
    // The current token we're building up
    string current = "";
    // Are we inside of a quoted string?
    bool inQuote = false;
    // The list of tokens to return
    List<string> tokens = new List<string>();

    foreach (char c in data)
    {
        if (inQuote)
        {
            switch (c)
            {
                case '"':
                    // The string literal has ended, go ahead and note 
                    // we're no longer in quote
                    inQuote = false;
                    break;
                default:
                    // Anything else gets added to the current token
                    current += c;
                    break;
            }
        }
        else
        {
            switch (c)
            {
                case '"':
                    // This is the start of a string literal, note that
                    // we're in it and move on
                    inQuote = true;
                    break;
                case ' ':
                case '\n':
                case '\r':
                case '\t':
                    // Tokens are sperated by whitespace, so any whitespace
                    // causes the current token to be added to the list of tokens
                    if (current.Length > 0)
                    {
                        // Only add tokens
                        tokens.Add(current);
                        current = "";
                    }
                    break;
                default:
                    // Anything else is part of a token, just add it
                    current += c;
                    break;
            }
        }
    }

    return tokens;
}

// Quick demo
static void Main(string[] args)
{
    string input = Console.ReadLine();
    string data = File.ReadAllText(input);

    List<string> tokens = Lex(data);
    Parse(tokens);

    Console.ReadLine();
}
Anon Coward
  • 9,784
  • 3
  • 26
  • 37
  • Although, in the parser, after It runs, tells me this: `Index was out of range. Must be non-negative and less than the size of the collection. Parameter name: index`, I don't know how that generated but, it did – One Ace Jul 12 '16 at 05:09
  • Ah, the end of file, I figured it out, it is only putting out: PRINT HELLO WORLD PRINT then the error pops up, I guess the '\n' isn't working, I'm not sure what to put then – One Ace Jul 12 '16 at 05:12
  • OK all you have to do is enter an extra space at the end of the file – One Ace Jul 12 '16 at 05:19
  • Yep, you need to end the file with a newline or space. And you can feel free to fix the parser so it doesn't have that limitation, consider an exercise for the reader :) – Anon Coward Jul 12 '16 at 05:34