-3

Here is a code snippet of a function that takes vector of strings (vector of customer names) and need to find names which occurs with some frequency. How to make it run faster (faster than 2 seconds especially operating with larger sets of data). This function is the all functionality of a program.

Thanks in advance.

vector<string> most_active(vector<string> &customers) {
    vector<string> result;
    double p = (double)customers.size()*5/100;
    
    for(string &customer: customers) {
        if(find(result.begin(), result.end(), customer) != result.end())
            continue;
        if(count(customers.begin(), customers.end(), customer) >= p)
                result.push_back(customer);
    }

    sort(result.begin(), result.end());

    return result;
}

I tried to pass data by reference instead of passing by value, but it didn't help

Aganju
  • 6,295
  • 1
  • 12
  • 23
  • 1
    2 seconds is not a long compile time. My work project takes 3 hours to compile on a laptop. I’ll trade you compile times, straight up. – Taekahn Mar 26 '22 at 19:58
  • 2
    I'm a bit puzzled, why do you care about compile time? Run time is the relevant part. If you want us to help with compile time, you should give us the compilation flags you used. – kvantour Mar 26 '22 at 20:00
  • 1
    If you want some solid advice on how to reduce your compile time, you will have to include your entire program. There is no way around that. One function isn’t going to cut it. – Taekahn Mar 26 '22 at 20:01
  • 1
    `time clang++ myprogram.cpp` – Eljay Mar 26 '22 at 20:03
  • As an aside, `(double)customers.size()*5/100;` can be written `0.05 * customers.size()`, without a cast. – BoP Mar 26 '22 at 20:03
  • 2
    Do you mean compile time or run time? – Alan Birtles Mar 26 '22 at 20:19
  • Sorry, my bad. It's about execution time. The problem is it's from the online test where I needed to complete prewritten program by writing this function and it normally works but in case with larger data cases it failed. P.S.: And I'm not allowed to post all of code, I'm afraid... – Вадим Башкарев Mar 26 '22 at 21:24
  • Well, one way you can speed it up is by pre-sizing your container with`reserve()` but I agree with the answer below. Use a better container. – Taekahn Mar 26 '22 at 22:42

1 Answers1

2

It looks like result should be a std::set since you want to keep the data ordered and you don't want any duplicates. This takes your O(n²) algorithm and makes it O(n log(n)).

set<string> most_active(vector<string>& customers) {
  set<string> result;
  double p = (double)customers.size() * 5 / 100;

  for (string const& customer : customers) {
    if (customer.size() >= p)
      result.insert(customer);
  }
  return result;
}
David G
  • 94,763
  • 41
  • 167
  • 253