2

I am working on converting the boolean algebra expression something like this

  NOT(a AND b) = (NOT a) OR (NOT b)
  NOT (a OR b) = (NOT a) AND (NOT b)
  a AND NOT (b AND c) = (EQ a) AND ( ( NOT b) OR ( NOT c) )

But, somehow it is not working for certain conditions like

  NOT a AND b = (NOT a) AND (EQ b)
  a AND NOT b AND NOT c = (EQ a) AND (NOT b) AND (NOT c)

Following is the logic for conversion. Am i doing anything wrong?

    public string ExpressionConversion(string expression)

    {
        string finalVal = null;
        string currentToken = "";
        bool isNotFound = false;
        List<string> strList = new List<string>();
        StringHelper stringHelper = new StringHelper();
        var values = stringHelper.CleverSplit(expression, ' ', false, true); //function which splits all the elements in the expression
        for (int j = 0; j < values.Length; j++)
        {
            currentToken = values[j].Trim();
            if (string.IsNullOrEmpty(currentToken.Trim()))
                continue;

            if ((j > 0) && currentToken.StartsWith("AND") && values[j - 1].StartsWith("AND"))
                continue;
            if (currentToken.Contains("NOT"))
            {
                isNotFound = true;
                continue;
            }
            if (currentToken.StartsWith("("))
                strList.Add(currentToken);
            else if (currentToken.StartsWith(")"))
            {
                strList.Add(currentToken);
                isNotFound = false;
            }
            else if (currentToken.StartsWith("AND"))
            {
                if (isNotFound)
                    strList.Add(" OR ");
                else
                    strList.Add(" AND ");
            }
            else if (currentToken.StartsWith("OR"))
            {
                if (isNotFound)
                    strList.Add(" AND ");
                else
                    strList.Add(" OR ");
            }
            else
            {
                if (isNotFound)
                {
                    strList.Add("( NOT " + currentToken + " )");
                    if (!expression.Contains("("))
                        isNotFound = false;
                }
                else
                    strList.Add("( EQ " + currentToken + " )");
            }
        }
        if (strList.Count > 0)
            finalVal = string.Join(" ", strList);
        return finalVal;
    }
Charlieface
  • 52,284
  • 6
  • 19
  • 43
mesha
  • 31
  • 5
  • 3
    You are trying to implement a parser and lexer, and then transform the result using Hoare logic. A lexer is recursive: you would need to split each value based on parenthesis, then recursively split that again. I don't see any recursiveness here so it would never work. The transform would also need Hoare logic to distribute the boolean operators across each value: I see none of that either. You are going to need way more code than this, and probably far more than a [so] answer could provide – Charlieface Apr 13 '22 at 10:47
  • 1
    @Charlieface You don't need recursion to parse an expression like this: the Shunting Yard Algorithm will do it just fine, and that gives you the input in postfix with the brackets removed, or an AST – canton7 Apr 13 '22 at 10:57
  • 2
    @canton7 True, forgot about that, but I don't see any of that either. – Charlieface Apr 13 '22 at 10:59
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/243913/discussion-on-question-by-mesha-boolean-algebra-expression-conversion-in-c). – Ryan M Apr 14 '22 at 19:53

1 Answers1

1

Your life's probably going to be easiest if you go for a proper parser.

Note: the error handling in here is pretty dreadful, and there are no unit tests. You'll want to improve these!


We'll first tokenise the input, producing tokens for the (, ), variables, operators, etc.

First, let's define our set of tokens. We'll use a set of classes which all implement the same marker interface, and we'll use a singleton instance where the token doesn't need to hold any further information. I'm also going to cheat and put precedence and associativity information on the operator tokens, which we'll use in a minute:

public interface IToken
{
}

public class LeftParenToken : IToken
{
    public static LeftParenToken Instance { get; } = new LeftParenToken();
    private LeftParenToken() { }
    public override string ToString() => "(";
}

public class RightParenToken : IToken
{
    public static RightParenToken Instance { get; } = new RightParenToken();
    private RightParenToken() { }
    public override string ToString() => ")";
}

public abstract class OperatorToken : IToken
{
    public int Precedence { get; }
    public bool IsLeftAssociative { get; }
    protected OperatorToken(int precedence, bool isLeftAssociative) => (Precedence, IsLeftAssociative) = (precedence, isLeftAssociative);
}

public class AndToken : OperatorToken
{
    public static AndToken Instance { get; } = new AndToken();
    private AndToken() : base(1, true) { }
    public override string ToString() => "AND";
}

public class OrToken : OperatorToken
{
    public static OrToken Instance { get; } = new OrToken();
    private OrToken() : base(1, true) { }
    public override string ToString() => "OR";
}

public class EqToken : OperatorToken
{
    public static EqToken Instance { get; } = new EqToken();
    private EqToken() : base(2, false) { }
    public override string ToString() => "EQ";
}

public class NotToken : OperatorToken
{
    public static NotToken Instance { get; } = new NotToken();
    private NotToken() : base(2, false) { }
    public override string ToString() => "NOT";
}

public class VariableToken : IToken
{
    public string Name { get; }
    public VariableToken(string name) => Name = name;
    public override string ToString() => Name;
}

With those defined, our tokenizer is pretty simple:

public class Tokeniser
{
    private static readonly Regex tokenRegex = new Regex(@"(\(|\)|\w+)\s*");

    private readonly string input;
    private int position;

    public Tokeniser(string input)
    {
        this.input = input;
    }

    public IToken? Next()
    {
        if (position == input.Length)
            return null;

        var match = tokenRegex.Match(input, position);
        if (!match.Success || match.Index != position)
        {
            throw new Exception($"Unexpected token at start of '{input.Substring(position)}'");
        }

        string token = match.Groups[1].Value;
        position += match.Length;

        return token switch
        {
            "(" => LeftParenToken.Instance,
            ")" => RightParenToken.Instance,
            "AND" => AndToken.Instance,
            "OR" => OrToken.Instance,
            "NOT" => NotToken.Instance,
            "EQ" => EqToken.Instance,
            _ => new VariableToken(token),
        };
    }
}

We just walk through the input, matching a regex at each step. The regex swallows whitespace (so we don't need to trim that ourselves). Anything which isn't one of our known keywords is a variable: this means you could have a variable called e.g. +.


Our tokanizer can parse the string e.g. a AND (b AND NOT b) into the tokens a, AND, (, b, AND, NOT, c, ). Now we want to parse that into an AST.

One fairly simply way to do is to first transform it into postfix using Dijkstra's Shunting-yard Algorithm.

In a postfix expression, the operators appear after their operands. So a + b becomes a b +. To process this, you make a little stack. When you see an operand, you push it onto the stack; when you see an operator, you pop off however many arguments it takes, apply the operator, and push the result onto the stack.

So the expression a AND NOT (NOT b AND c) becomes a b NOT c AND NOT AND, that is:

Operation Stack
Push a [a]
Push b [a, b]
NOT [a, NOT b]
Push c [a, NOT b, c]
AND [a, (NOT b) AND c]
NOT [a, NOT (NOT b AND c)]
AND [a AND NOT (NOT b AND c)]

The lovely thing about this is that you don't need to worry about parentheses: you can just walk the postfix expression token by token and evaluate it as you go.

public class Parser
{
    public IEnumerable<IToken> Shunt(Tokeniser tokeniser)
    {
        var operators = new Stack<IToken>();
        
        bool lastTokenWasVariable = false;

        while (tokeniser.Next() is { } token)
        {
            if (lastTokenWasVariable && token is VariableToken or NotToken)
            {
                // A variable after a variable, or a NOT after a variable, has an implicit AND between them
                foreach (var t in ProcessOperator(AndToken.Instance))
                {
                    yield return t;
                }
            }
            
            switch (token)
            {
                case VariableToken variable:
                    yield return variable;
                    break;
                case OperatorToken op:
                    foreach (var t in ProcessOperator(op))
                    {
                        yield return t;
                    }
                    break;
                case LeftParenToken:
                    operators.Push(token);
                    break;
                case RightParenToken:
                    while (operators.TryPeek(out var peek) && peek is not LeftParenToken)
                    {
                        if (operators.Count == 0)
                        {
                            throw new Exception("Count not find matching '(' for ')'");
                        }
                        operators.Pop();
                        yield return peek;
                    }
                    if (!operators.TryPop(out var pop) || pop is not LeftParenToken)
                    {
                        throw new Exception("Expected a '(' at the top of the operators stack");
                    }
                    break;
            }
            
            lastTokenWasVariable = token is VariableToken;
        }

        while (operators.TryPop(out var op))
        {
            if (op is LeftParenToken)
            {
                throw new Exception("Unexpected '('");
            }
            yield return op;
        }
        
        IEnumerable<IToken> ProcessOperator(OperatorToken op1)
        {
            while (operators.TryPeek(out var peek) && peek is OperatorToken op2 && (op1.IsLeftAssociative ? op2.Precedence >= op1.Precedence : op2.Precedence > op1.Precedence))
            {
                operators.Pop();
                yield return op2;
            }
            operators.Push(op1);
        }
    }
}

This is pretty much a direct port of the algorithm on Wikipedia.

The only complexity is that you need to support inputs of e.g. a b or a NOT b to mean a AND b and a AND NOT b respectively. The shunting-yard algorithm can't cope with this natively: it's expecting an operator between operands. So we hack this in: if we see a variable or a NOT which directly follows another variable, we'll pretend that we saw an AND first.


The next step involves turing this into an AST. We'll use a tree, where operators and variabls are all represented by nodes. A variable node doesn't have any children: it just knows the variable name. A unary operator (NOT or EQ) has a single child. A binary operator (AND or OR) has two children, which are the two things being anded or or'd together.

We'll define these in the same way as the tokens, with a bunch of classes implementing a common interface.

We'll also add an Invert method: when implemented on a node, this will return an inverted version of the node, applying De Morgan's laws. So inverting an EQ node results in a NOT node with an inverted child (and vice versa); inverting an AND or OR results in an OR / AND respectively, with both of its children inverted.

This means that inverting EQ A results in NOT A, inverting EQ A AND EQ B results in NOT A OR NOT B, etc.

public interface INode
{
    INode Invert();
}

public class BinaryOperatorNode : INode
{
    public OperatorToken Operator { get; }
    public INode Left { get; }
    public INode Right { get; }

    public BinaryOperatorNode(OperatorToken op, INode right, INode left) => (Operator, Right, Left) = (op, right, left);

    public INode Invert()
    {
        return Operator switch
        {
            AndToken => new BinaryOperatorNode(OrToken.Instance, Right.Invert(), Left.Invert()),
            OrToken => new BinaryOperatorNode(AndToken.Instance, Right.Invert(), Left.Invert()),
            _ => throw new Exception($"Unexpected binary operator type {Operator}"),
        };
    }
}

public class UnaryOperatorNode : INode
{
    public OperatorToken Operator { get; }
    public INode Child { get; }
    public UnaryOperatorNode(OperatorToken op, INode child) => (Operator, Child) = (op, child);
    
    public INode Invert()
    {
        return Operator switch
        {
            NotToken => new UnaryOperatorNode(EqToken.Instance, Child.Invert()),
            EqToken => new UnaryOperatorNode(NotToken.Instance, Child.Invert()),
            _ => throw new Exception($"Unexpected unary operator type {Operator}"),
        };
    }
}

public class VariableNode : INode
{
    public VariableToken Variable { get; }
    public VariableNode(VariableToken variable) => Variable = variable;

    public INode Invert()
    {
        return this;
    }
}

With that in place, we can write our little method to generate the AST. This does pretty much exactly what I described above, with a stack. We however sneak in an EQ node every time we see a variable token -- so a variable will always be created with an EQ node pointing to it. When we see a NOT token, we'll just invert whatever's currently on the stack: if it's an EQ node, it'll turn itself into a NOT node.

public class Parser
{
    public INode BuildAst(IEnumerable<IToken> tokens)
    {
        var stack = new Stack<INode>();

        foreach (var token in tokens)
        {
            switch (token)
            {
                case VariableToken variable:
                    stack.Push(new UnaryOperatorNode(EqToken.Instance, new VariableNode(variable)));
                    break;

                case AndToken:
                case OrToken:
                    if (stack.Count < 2)
                    {
                        throw new Exception($"Expected 2 parameters for operator {token}, got fewer");
                    }
                    stack.Push(new BinaryOperatorNode((OperatorToken)token, stack.Pop(), stack.Pop()));
                    break;

                case EqToken:
                    // We treat EQ as a no-op: we'll add it when we add its variable
                    break;

                case NotToken:
                    if (stack.Count < 1)
                    {
                        throw new Exception($"Expected 1 parameter for operator {token}, got fewer");
                    }
                    // If we encounter a 'not', we invert the current tree
                    stack.Push(stack.Pop().Invert());
                    break;
            }
        }
        
        if (stack.Count != 1)
        {
            throw new Exception("Unexpected leftover tokens");
        }

        return stack.Pop();
    }
}

Finally we need to render our AST to a string. We'll do this simply by recursively visiting each node in turn.

  • If it's a variable, add the name to the result.
  • If it's a unary operator, add it to the result inside parentheses, and visit whatever it's operating on, e.g. (NOT <visit child>).
  • If it's a binary operator, we'll add its left child to the result, then the operator, then the right child. If either child is itself a binary operator, we'll wrap it in parentheses.

This means that:

   AND
  /  \
 a    b

gets rendered as a AND b, but:

   AND
  /   \
 a     AND
      /   \
     b     c

gets rendered as a AND ( b AND c ).

public class Renderer
{
    public string Render(INode rootNode)
    {
        var sb = new StringBuilder();

        Visit(rootNode);

        void Visit(INode node)
        {
            switch (node)
            {
                case BinaryOperatorNode op:
                    VisitWithParens(op.Left);
                    sb.Append($" {op.Operator} ");
                    VisitWithParens(op.Right);
                    break;

                case UnaryOperatorNode op:
                    sb.Append($"({op.Operator} ");
                    Visit(op.Child);
                    sb.Append(")");
                    break;

                case VariableNode variable:
                    sb.Append(variable.Variable);
                    break;
            }
        }

        void VisitWithParens(INode node)
        {
            if (node is BinaryOperatorNode)
            {
                sb.Append("( ");
            }
            Visit(node);
            if (node is BinaryOperatorNode)
            {
                sb.Append(" )");
            }
        }

        return sb.ToString();
    }
}

And that's pretty much it! Stitch it all together:

var tokeniser = new Tokeniser("b NOT a c");
var parser = new Parser();
var tokens = parser.Shunt(tokeniser);
var ast = parser.BuildAst(tokens);
var renderer = new Renderer();
Console.WriteLine(renderer.Render(ast));

See it on dotnetfiddle.net.

canton7
  • 37,633
  • 3
  • 64
  • 77