Python for creating a set operations Calculator

Question

I'm trying to create a calculator, not for numbers, but for set operations. To illustrate the concept lets say you have a file with two columns.

keyword, userid
hello  , john
hello  , alice
world  , alice
world  , john
mars   , john
pluto  , dave

The goal is to read in expressions like

[hello]

and return the set of users who have that keyword. For example

[hello]           -> ['john','alice']
[world] - [mars]  -> ['alice'] // the - here is for the difference operation
[world] * [mars]  -> ['john','alice'] // the * here is for the intersection operation
[world] + [pluto] -> ['john','alice','dave'] // the + here is for union operation

I used the plyplus module in python to generate the following grammar to parse this requirement. The grammar is shown below

 Grammar("""
 start: tprog ;
 @tprog: atom | expr u_symbol expr | expr i_symbol expr | expr d_symbol | expr | '\[' tprog '\]';
 expr:   atom | '\[' tprog '\]';
 @atom: '\[' queryterm '\]' ;
 u_symbol: '\+' ;
 i_symbol: '\*' ;
 d_symbol: '\-' ;
 queryterm: '[\w ]+' ;

 WS: '[ \t]+' (%ignore);
 """)

However, I'm not able to find any good links on the web to take the parsed output to the next level where I can evaluate the parsed output step by step. I understand that I need to parse it into syntax tree of some sort and define functions to apply to each node & its children recursively. Any help appreciated.

The following is for creating a recursive descent parser for arithmetic expressions in python. Presumably, you could simply define your own grammar/tokens? http://blog.erezsh.com/how-to-write-a-calculator-in-70-python-lines-by-writing-a-recursive-descent-parser/ — Luke, Jul 17 '15 at 02:46
I had already been through that site, but having a tough time mapping the arithmetic operations to set operations — broccoli, Jul 17 '15 at 03:25

MaX Ali · Answer 1 · 2015-07-17T05:17:06.817

I am beginner but I tried to solve your question so excuse me if my answer has some mistakes. I suggest using pandas and I think it works best in this case.

First save the data in a csv file

Then

from pandas import *

The next line is to read the file and turn it into a dataframe

x=read_csv('data.csv')
print(x)

The result

 keyword  userid
0  hello      john
1  hello     alice
2  world     alice
3  world      john
4  mars       john
5  pluto      dave

In the next line, we will filter the data frame and assign it to a new variable

y= x[x['keyword'].str.contains("hello")]

where keyword is the column of interest and hello is what we are searching for The result

  keyword  userid
0  hello      john
1  hello     alice

We only interested in the second column so we will use indexing to take it and then save it in a new variable

z=y.iloc[:,1]
print(z)

The result

0      john
1     alice

Now the last step is turning the dataframe into a list by using

my_list = z.tolist()

print(my_list)

The result

[' john', ' alice']

I think you achieve the functionality you require by just manipulating the resulting lists

Update: I tried to solve for the case where you have "or" the code becomes like this

from pandas import *

x=read_csv('data.csv')
print(x)
y= x[x['keyword'].str.contains("hello|pluto")]
print(y)

z=y.iloc[:,1]
print(z)
my_list = z.tolist()

print(my_list)

The result

[' john', ' alice', ' dave']

Update2: I found a solution for the "-" and "*" cases First, we use the same code for both words

from pandas import *

x=read_csv('data.csv')
print(x)
y= x[x['keyword'].str.contains("world")]
print(y)

z=y.iloc[:,1]
print(z)
my_list = z.tolist()

print(my_list)
s= x[x['keyword'].str.contains("mars")]
print(s)

g=s.iloc[:,1]
print(g)
my_list2 = g.tolist()

print(my_list2)

then we add a loop two subtract the two lists

for i in my_list:
    if i in my_list2:
        my_list.remove(i)
print(my_list)

The result

[' alice']

And for the intersection just change the last bit

for i in my_list:
    if i not in  my_list2:
        my_list.remove(i)
print(my_list)

The result

[' john']

Thanks for the input. But pandas wouldn't work here, though it would for the cases I mention. I need to scale it like a language, so its bound to have lots of nested conditions. See comments under the question — broccoli, Jul 17 '15 at 05:49

Luke · Answer 2 · 2015-07-17T14:36:05.567

Okay, so this is a Java implementation I put together, just for fun. It essentially tokenizes the input, and then uses the shunting yard algorithm to evaluate the tokens.

Note that I'm honestly not 100% sure if it's working correctly for complex expressions re: operator precedence, order of ops, etc.

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayDeque;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map.Entry;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.Scanner;

@SuppressWarnings("unchecked")
class Setop {

    public static void main(String[] args) {
        (new Setop()).run();
    }

    private HashMap<String, HashSet<String>> sets;

    private void run() {
        // Read file to initialise sets
        Scanner fin;
        try {
            fin = new Scanner(new File("data"));
        } catch (FileNotFoundException ex) {
            System.out.println("Cannot find data file");
            return;
        }
        sets = new HashMap<String, HashSet<String>>();
        fin.nextLine();
        while (fin.hasNextLine()) {
            String[] l = fin.nextLine().split(",");
            if (l.length == 2) {
                String k = "[" + l[0].trim() + "]";
                String v = l[1].trim();
                if (sets.get(k) == null) {
                    sets.put(k, new HashSet<String>());
                }
                sets.get(k).add(v);
            }
        }
        for (Entry<String, HashSet<String>> e : sets.entrySet()) {
            System.out.print(e.getKey() + ": ");
            for (String s : e.getValue())
                System.out.print(s + ", ");
            System.out.println();
        }
        // Main input/evaluation loop
        Scanner in = new Scanner(System.in);
        while (true) {
            System.out.print("> ");
            String sin = in.nextLine();
            if (sin.trim().equals("")) {
                return;
            }
            evaluate(sin);
        }
    }

    private void evaluate(String sin)
    {
        // Tokenize
        ArrayDeque<String> tokens = new ArrayDeque<String>();
        Matcher m = Pattern.compile(" *((\\[[a-z]+\\])|\\+|\\*|\\-|\\(|\\)) *").matcher(sin);
        int i = 0;
        while (m.find()) {
            if (m.start() != i) {
                System.out.println("Cannot tokenise at pos " + i);
                return;
            }
            i = m.end();
            tokens.add(m.group().trim());
        }
        if (i != sin.length()) {
            System.out.println("Cannot tokenise at pos " + i);
            return;
        }
        // Shunting yard algorithm to evaluate
        ArrayDeque<HashSet<String>> output = new ArrayDeque<HashSet<String>>();
        ArrayDeque<String> operators = new ArrayDeque<String>();
        for (String token : tokens) {
            if (Pattern.matches("\\[[a-z]+\\]", token)) {
                HashSet<String> s = sets.get(token);
                if (s == null) {
                    System.out.println("Cannot find set " + token);
                    return;
                }
                output.push((HashSet<String>)s.clone());
            } else if (Pattern.matches("\\+|\\*|\\-", token)
                || token.equals("(")) {
                operators.push(token);
            } else if (token.equals(")")) {
                while (operators.size() > 0
                    && Pattern.matches("\\+|\\*|\\-", operators.peek())) {
                    if (output.size() < 2) {
                        System.out.println("Cannot evaluate, excess operations");
                        return;
                    }
                    output.push(operate(operators.pop(), output.pop(), output.pop()));
                }
                if (operators.size() == 0
                    || !operators.pop().equals("(")) {
                        System.out.println("Cannot evaluate, unmatched parenthesis");
                        return;
                }
            } else {
                System.out.println("Cannot evaluate, unknown token " + token);
                return;
            }
        }
        while (operators.size() > 0) {
            if (operators.peek().equals("(")) {
                System.out.println("Cannot evaluate, unmatched parenthesis");
                return;
            }
            if (output.size() < 2) {
                System.out.println("Cannot evaluate, excess operations");
                return;
            }
            output.push(operate(operators.pop(), output.pop(), output.pop()));
        }
        if (output.size() != 1) {
            System.out.println("Cannot evaluate, excess operands");
            return;
        }
        for (String s : output.pop()) {
            System.out.print(s + ", ");
        }
        System.out.println();
    }

    private HashSet<String> operate(String op, HashSet<String> b, HashSet<String> a) {
        if (op.equals("+")) {
            // Union
            a.addAll(b);
        } else if (op.equals("*")) {
            // Intersection
            a.retainAll(b);
        } else {
            // Difference
            a.removeAll(b);
        }
        return a;
    }

}

Python for creating a set operations Calculator

2 Answers2