0

Background

I have a struct:

struct event {
    uint16_t id;
    uint8_t type;
    std::string name;
    // many more fields (either string or integer types)
};

A boolean expression(stored in a string):

name = "xyz" and (id = 10 or type = 5)

I have a vector of event objects(thousands) std::vector<event> and want to check if they satisfy the boolean expression.

Things I have implemented so far

  • I implemented a tokenizer and a recursive descent parser to convert the boolean expressions to an AST.
  • Store the terminals as a std::pair<std::string, std::string>, where the first value is the struct field name and second is the value. For example, std::pair<"name", "acbd">
  • I walk through the tree and whenever it sees a terminal tree node:
bool match(const event& ev, const ast::node& node) {
    if (node.first == "name") {
        return (ev.name == node.second);
    } else if (node.first == "id") {
        return (ev.id == std::stoi(node.second));
    } else if (node.first == "type") {
        return (ev.type == std::stoi(node.second));
    } // more else if blocks for each member of event struct
    ...
}

Questions

The struct contains 10 members. I want to avoid the unnecessary comparisons. In the worst case, an AST terminal node (for example, pair<"type", "5000">) might result in 10 comparisons.

I tried constructing this lookup map:

std::map<std::string, std::size_t> fields;

fields["name"] = offsetof(event, name);
fields["id"] = offsetof(event, id);
fields["type"] = offsetof(event, type);

Using this map, I can simplify match() to:

bool match(const event& ev, const ast::node& node) {
    const auto offset = fields[node.first];
    return ((ev + offset) == node.second); // DOESN'T WORK
}
  1. I can access the struct member at an offset using (eventObj + offset). How do I convert it to the correct data type for the comparison to work?

  2. For now, all the field values in AST terminal nodes are std::string. How do I convert it to the correct type during tokenizer or parsing step? I can store the AST node as std::pair<std::string, std::any> but still need the type information for std::any_cast.

Both of these can be solved if I can somehow store the type information in the fields map. I am not sure how.

psy
  • 914
  • 3
  • 10
  • 31

1 Answers1

2

Build a map from field name to std::function<std::function<bool(event const&)>(std::string const&)>, like:

lookup["name"]=[](auto str){
  return [val=std::move(str)](auto& e){
    return e.name==val;
  };
};

now you can convert your pairs into test functions.

Now your

name = "xyz" and (id = 10 or type = 5)

becomes

tree(
  lookup["name"]("xyz"), 
  std::logical_and<>{},
  tree(
    lookup["id"]("17"),
    std::logical_or<>{},
    lookup["type"]("5")
  )
);

with piles of work.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • Thanks! That's exactly what I want. `event.id` and other fields are integral types. Instead of doing `std::to_string(e.id) == val`, is there a way to convert pair value to appropriate type? – psy Jun 05 '21 at 20:37
  • This might sound too convoluted, but maybe modify the map from field name to a `pair`? First parameter is your comparison function. Second parameter is a function that can convert from/to `std::any`? – psy Jun 05 '21 at 20:41
  • 1
    Not going all out with `lookup["name"] = matches(&event::name);` :-)? – Barry Jun 05 '21 at 20:48
  • 1
    @barry The OP can write matches, yes. That was intentional. But I have chores, so just sketched out infrastructure. 10 manual ones isn't too bad. – Yakk - Adam Nevraumont Jun 05 '21 at 20:52
  • 1
    @psy The outer lambda converts the string to the appropriate type. any is a bad plan here, unless the number and type of field members are also runtime determined. Finally, past 1000s a DB or hash or quick search could be worth it; but that is a secons pass imho – Yakk - Adam Nevraumont Jun 05 '21 at 20:52
  • @Yakk-AdamNevraumont, The number and type of field members are known at compile time. I am trying to avoid doing the conversion while evaluating the event object. My application processes thousands(if not tens of thousands) of events every hour. Also, there are multiple(tens) of such boolean expressions. Whenever I see a new event, I need to check it against a vector of AST trees(one tree for each expression). – psy Jun 05 '21 at 21:37
  • Could be premature optimization :) – psy Jun 05 '21 at 21:42
  • @Barry TIL. So, `matches` is a template function that takes in `T event::* ptr` as an argument? – psy Jun 05 '21 at 21:47
  • 1
    @psy The outer lambda converts, and is called at parsing time. You store the inner lambda, which takes an event and returns bool. – Yakk - Adam Nevraumont Jun 05 '21 at 23:33