More efficient way to write this algorithm?

Question

Currently working on a Library simulator assignment. Everything is working fine, but I would like to know something just for the sake of knowing it.

In this program there are 3 classes: Book, Patron, and Library. The library class contains 3 private data members: a vector of pointers to books, a vector to pointers of patron's, and a currentDate int.

The function in question is below:

void Library::incrementCurrentDate()
{
  currentDate++;

  for (int i = 0; i < members.size(); i++)
  {
    vector<Book*> ptr = members.at(i)->getCheckedOutBooks();

     for (int j = 0; j < ptr.size(); j++)
      {
        if (currentDate > ptr.at(j)->getCheckOutLength())
            members.at(i)->amendFine(.10);
      }
  }
}

the requirements for the function are this:

increment current date; increase each Patron's fines by 10 cents for each overdue Book they have checked out (using amendFine)

The way I have written it above works fine now. As I am just in my first semester of my computer science program, we cannot use anything that we have not covered, which I know is alot. With that being said, would there be a more efficient way to implement this function using more advanced c++ programming methods?

You should make `ptr` a reference. Right now, it copies a vector every cycle, but it doesn't look like you ever need to modify it. — Weak to Enuma Elish, Dec 02 '15 at 04:29
@JamesRoot thanks! didn't even catch that. Would I just make it &ptr then? — , Dec 02 '15 at 04:34
keep the books sorted by due date, so you don't check every book — Glenn Teitelbaum, Dec 02 '15 at 04:34
@GlennTeitelbaum Looks like you and I had the same idea at the same time. — sparkyShorts, Dec 02 '15 at 04:35
use `[]` instead of `at` since you are already range checking — Glenn Teitelbaum, Dec 02 '15 at 04:37
Counting the number of overdue books and adjusting the fine once at the end might be faster than calling the function every time. — Weak to Enuma Elish, Dec 02 '15 at 04:46
@jamesroot if he only incremented fines once per member then he wouldn't thrash memory as much, nice idea — Glenn Teitelbaum, Dec 02 '15 at 05:00
@mikeTheLiar Why is there no automatic way to migrate to Code Review (via close->off topic)? — Walter, Dec 02 '15 at 14:24
@Walter I think migration options need to be manually added and as CR only just made it out of beta maybe it's still being discussed? [According to this](http://meta.codereview.stackexchange.com/q/6141/28725) it hasn't been made a target for migration yet. — Mike G, Dec 02 '15 at 14:25
@Walter Mostly because the CR community doesn't think it would be a good idea. — Kaz, Dec 02 '15 at 14:27
See [What is the latest on adding Code Review to Off-Topic Migration Options?](http://meta.stackoverflow.com/questions/311348/what-is-the-latest-on-adding-code-review-to-off-topic-migration-options/311350#311350). Also, comment thread on this post are not a good place to discuss this. Please use Meta instead, — Phrancis, Dec 02 '15 at 14:32
_but I would like to know something just for the sake of knowing it._ +1 — Boris the Spider, Dec 02 '15 at 14:53
Why are you calculating their fine every day? You only need to know the fine when you go to look at it - you simply take the current date minus the due date times the finePerDay. — corsiKa, Dec 02 '15 at 15:52

CinchBlue · Accepted Answer · 2015-12-02T04:58:12.613

Use std::vector if the size is not extremely large.

Pointers always have a cost associated with them because of the indirection involved. Looking up an address and accessing it in memory may not be able to be optimized by the compiler out and will thus involve costs with accessing memory. Memory access is often the bottleneck for performance in systems, so it's best to try to put things near to each other in memory and try to structure your programs so that you access memory the least.

Use a database system, like SQL, if the data gets extremely large.

On the other hand, we can forego all of the dirty work and use an established database library or program. Something like MySQL can easily manage a lot of data with a great programming language to access and manage it as well. Certain databases, like PostgreSQL can scale to large sets of data. Getting familiar with it can also be quite helpful. Even some mobile apps might use MySQL for Android, for example.

Use the modern C++11 or greater for loop iteration syntax.

The current for loop syntax is quite opaque and might have a lot of cruft. C++11 introduced a cleaner for loop syntax to iterate across standard library containers like map or vector. Use: for(auto it : vector_name). If you need to modify each one, use a reference qualifier for the it--the iterator.

Use pre-increment syntax for possibly minimal speedup.

++i and i++ are slightly different. ++i just directly modifies the object where it appears in an expression before it continues evaluating it. i++ creates a copy of the object, returns it, and increments the original. Creating a copy of a value or object has a cost in C++, so avoiding this can be helpful in certain cases and it is a good convention to do this anyways.

Pass by const &. Not by just regular reference.

Function arguments are passed by value by default in C++. This means that C++ just makes a copy of the object. However, when there are mutations applied repeatedly to an object, like, say, using a function to change the value of an integer over time, you may want to pass by reference. References basically pass the "real" object, meaning that any changes you make to the reference are done on the "real" object.

Now, why pass a non-modifiable object? Because it can lead to better optimizations. Passing by constant reference allows the compiler to make stronger assumptions about your code (e.g. because the reference cannot change within the course of the program, referring to the same reference multiple times in the function doesn't require the value of the argument to be reloaded over again because it shouldn't change while inside of a function).

Use a std::unique_ptr or std::shared_ptr.

Smart pointers are also a nice feature that was introduced with C++11, and involves pointers that automatically deallocate themselves by attaching their lifetime to scope. In other words, no need to use new or delete--just create the pointers and you shouldn't have to keep track of when to release memory. This can get complicated in certain situations but in general, using smart pointers leads to better safety and less change of having memory management problems, which is why they were inducted into the standard library in the first place.

thank you so much, very helpful information, gives me a lot of things to read up on later — , Dec 02 '15 at 04:44
4 is a moot point. Compilers will [optimize](http://stackoverflow.com/questions/1116735/i-less-efficient-than-i-how-to-show-this) pre- and post-increment and decrement in a `for`-loop. — erip, Dec 02 '15 at 04:52
item 7: yes, whitespace makes programs slower, but not as slow as those pesky comments — Glenn Teitelbaum, Dec 02 '15 at 04:54
@GlennTeitelbaum Yeah I just removed that. It's more pointless than pointless cruft. — CinchBlue, Dec 02 '15 at 04:58
I was wondering why no one mentioned point 6., and then I read you answer ;) — CinCout, Dec 02 '15 at 04:59
smart pointers do not make code run faster, just safer and easier to maintain — Glenn Teitelbaum, Dec 02 '15 at 05:02
@erip I put it because it's a good rule-of-thumb to follow. Having to deal with an expression that has an `i++` that creates strange behavior can be hard to find because you think that it's just `i+1` when it's really just `i` and then `i+1`. I know I've tripped over it before so I put it there. — CinchBlue, Dec 02 '15 at 05:18
It is probably worth using a local database such as SQLite (it's light weight as opposed to MySQL or others) to store your data. A side note, @GlennTeitelbaum, white-space should not be making compiled programs slower. Please do not spread propagate this kind of thing. — user2882597, Dec 02 '15 at 13:16
@user3730788 it was sarcasm and answer was modified because that was understood. To be clear, **whitespace and comments are removed during compilation and have no effect on the resulting executable** As a side note, we generally do not optimize a loop by using a database, we use a database for scale, not speed — Glenn Teitelbaum, Dec 02 '15 at 14:09
@erip: this highly depends, on an integer? Yes, obviously. On an iterator object? Roll a dice, but the more complicated the iterator (notably if instrumented with sanity checks) the less likely that the compiler will optimize... — Matthieu M., Dec 02 '15 at 15:56
@VermillionAzure: As nice as this advice is, you are only pointing out implementation details without commenting on the algorithm at all. 3/4/5/6 have little impacts on performance; 2 requires to first learn how to design a database; 1 is already used. Nice advice, but little contribution... — Matthieu M., Dec 02 '15 at 15:59

score 5 · Answer 2 · answered Dec 02 '15 at 04:43

There are a couple of questions to answer here I think. The first being: can this algorithm be more efficient? And the other being: can my implementation of the algorithm in c++ be more efficient?

To the first question, I would answer no. Based on the problem, it sounds to me like you have no further information that would allow you to do any better than O(n^2).

As mentioned in the comments, you could iterate over every person and sort their books by due date. In practice this could save some time, but in theory book lookup would still be linear time, O(n). Plus you add the overhead of sorting making your algorithm now O(mnlog(n)) where m is the number of patrons and n is the number of books. If you know you have few patrons with many books each, then sorting could be beneficial. If you have many patrons with few books, it would be much less beneficial.

As for the second question: there are a few small tweaks (and a few large tweaks) that could make your code more efficient, although I would argue that a vast majority of the time they would not be necessary. One major thing I notice is that you recreate a vector object on every iteration of you first for loop. By doing this you are creating unnecessary overhead. Try instead this pseudo-code:

currentDate++;
vector<Book*> ptr = members.at(i)->getCheckedOutBooks();
for(....)

Another optimization that could be a large overhaul would be to drop the Vector library. A vector in c++ has the ability to be resized on the fly as well as other unnecessary overhead (for your task). Simply using an array would be more memory efficient, although equivalently time efficient.

You mentioned being in your first semester, so you probably have not been introduced to Big O notation yet.

correct, we have not been introduced to Big O notation yet, but thank you for the link! I just want to start absorbing everything that I can whenever I have some downtime — , Dec 02 '15 at 04:53
@GlennTeitelbaum Relatively sure (though open for discussion). I suppose it could be more correct to say the original algorithm is O(mn) where m is number of patrons and n is number of books. Lookup in a sorted list is same as lookup in unsorted (linear) because worst case scenario every book is overdue, so we iterate over all of them. Sorting is well known to be O(nlog(n)) at best. So yes, I stand by my Big O usage. — Andnp, Dec 02 '15 at 05:01
Ah I see your point, I should have been more clear. It is O(mn) where m is the number of patrons, and n is the number of books per patron. — Andnp, Dec 02 '15 at 14:52

score 5 · Answer 3 · answered Dec 02 '15 at 04:48

5

if that is the only operation you are optimizing then keep a vector of tuple <int, Book *, Patron * > sorted by the int representing due date of evey checked out Book Then just iterate until the due date is greater than the current date appying fines the the associated Patron.

answered Dec 02 '15 at 04:48

Glenn Teitelbaum

10,108
3
36
80

interesting, we haven't covered that yet but i'll definitely play around with it, thanks! – Dec 02 '15 at 04:57
Have you considered that you probably also want to remove books from that list when the members check them in? Removing from such a list would be expensive since you don't remove from the end. – HelloGoodbye Dec 02 '15 at 13:18
@hellogoodbye optimization is often a tradeoff between operations. if applying fines (generally a batch operation with a large N) is the most critical, then as posted by OP this would be my solution. adding and removing books is limited by most libraries to a small N, so performance may not be as critical. when balancing operations, a linked list with side indexes might be optimal overall, but that wasn't the question – Glenn Teitelbaum Dec 02 '15 at 13:43
@HelloGoodbye yes actually, there is a function in the library class that will remove a book from a patrons checkedoutbooks list when they return it – Dec 03 '15 at 04:34
@GlennTeitelbaum What do you mean by "balancing operations"? – HelloGoodbye Dec 03 '15 at 14:40
@NickD In that case, if you choose to add any auxiliary structure, you of course need to update that function to also remove the book from that structure. – HelloGoodbye Dec 03 '15 at 14:45

score 4 · Answer 4 · answered Dec 02 '15 at 10:17

4

If you have n checked out books, m of which are overdue, your algorithm takes O(n) time to add the fines. This is because your data structure stores information like this

member -> list(checked out books)
book -> check-out length // presumably the due date for returning the book

If in addition to your members collection you also store the following information:

check-out length -> list(checked out books with that due date)
book -> member who checked it out

then you can use a sorted tree that stores all checked-out books by their due date to look up all overdue books in O(log n). Thus the total asymptotic run-time of your algorithm would reduce from O(n) to O(log n + m).

answered Dec 02 '15 at 10:17

Thomas

17,016
4
46
70

I was on the same track as you. What container type would you use to store all lists of checked out books with a certain due date? Because such a list would depend on the due date so you would need to have many different lists of that type. – HelloGoodbye Dec 02 '15 at 13:14
That's right, but the number of lists would be `<= n`. I would suggest using a `std::set` to store the a struct consisting of the due date and the list of all books that are due on that date. Then you need to provide a `<` operator for these structs; here you would simply compare the due dates. Finally, you can use `std::set::upper_bound` to retrieve all books that are due before a certain date. – Thomas Dec 02 '15 at 13:49
Note that `std::set` is typically implemented using a binary search tree, according to http://www.cplusplus.com/reference/set/set/ . – Thomas Dec 02 '15 at 13:58

score 3 · Answer 5 · answered Dec 02 '15 at 04:57

3

You might consider replacing the vector with an std::map container. Maps are stored as sorted trees. If you define a comparator function comparing checkout length (or more likely "expired date") you do not need to scan the entire list every time.

A more complicated solution would store all books in a single tree of pointers sorted by their expiration time. Then you wouldn't have to iterate over members at all. Instead iterate over books until you find a book that has not expired yet.

This is more complicated because now adding/removing books for each member (or even iterating over all books a member owns) is more difficult and may require maintaining a vector of pointers for each user as your current approach (in addition to the global book map).

answered Dec 02 '15 at 04:57

nimrodm

23,081
7
58
59

a sorted `vector` is faster to walk – Glenn Teitelbaum Dec 02 '15 at 14:20
@GlennTeitelbaum: but adding entries is more expensive. So assuming users borrow and return books frequency, a tree holding all books might be better. Depends of course on the number of books involved. For small numbers vectors are more efficient due to memory efficiency and avoiding pointer chasing. – nimrodm Dec 02 '15 at 14:26
@nimrodm Aren't maps a form of linked lists in implementation right now? If I remember, there was a CPPCon presentation about this a little while ago. – CinchBlue Dec 02 '15 at 19:08
1

@VermillionAzure `std::map` and related containers are typically implemented as balanced binary trees. The C++ standard specifies complexity requirements for some operations that couldn't be satisfied with a linked list. – Blastfurnace Dec 02 '15 at 19:21
@Blastfurnace I thought binary trees are often structured as linked list-like anyways. The point is that they use pointers. Is that right? – CinchBlue Dec 02 '15 at 20:13
@VermillionAzure: Yes, they are node-based similar, in that sense, to a linked list but they are not organized in a linear order. They branch and allow O(log n) time access unlike the O(n) access of a linked list. – Blastfurnace Dec 02 '15 at 20:16
@VermillionAzure: This article should make it clearer: [Self-balancing binary search tree](https://en.wikipedia.org/wiki/Self-balancing_binary_search_tree). – Blastfurnace Dec 02 '15 at 20:19
@Blastfurnace But that example has more than one access required for a read, right? A vector has only one access required and is constant time? – CinchBlue Dec 02 '15 at 20:26
@VermillionAzure: Yes, a vector has random access. That assumes you know exactly which index you want to read. I think this answer is suggesting a __sorted__ vector or `std::map` for the ability to quickly find just the overdue books. – Blastfurnace Dec 02 '15 at 20:35

sparkyShorts · Answer 6 · 2015-12-02T17:00:24.290

2

It's been a while since I've used C++, but almost always the standard libraries will be faster than your own implementation. For your reference, check out the standard functions that are associated with std::vector (this site is extremely useful).

You might be able to slim down the ptr.size() through some other filtering logic so you don't have to iterate over people who don't have late books (maybe some sorting on books and their due dates?)

edited Dec 02 '15 at 17:00

answered Dec 02 '15 at 04:34

sparkyShorts

630
9
28

1

that does make a lot of sense, thanks for that input – Dec 02 '15 at 04:44

score 1 · Answer 7 · answered Dec 02 '15 at 14:09

Right now you're amend fines in O(n) (n being getCheckOutLength().size()) but you can do it in O(log(n)) since you need just number of late books and not their objects for fining, if you have that number then you multiply it by .01 and use one amend fine function to do it all.

So here is the way I suggest:
If you keep getCheckedOutBooks() sorted by their getCheckOutLength() in a vector then you can find which dates are more than curDate by finding std::upper_bound in the vector that gives you the first element that is greater than the currentDate, so from that element index to the end of vector is number of book who should be fined, here is the code:

int checkedDateComparator(Patron & leftHand, Patron & rightHand){
    return leftHand.getCheckedOutLength() < rightHand.getCheckOutLength();  
}
bool operator==(Patron & a, Patron & b){
    return a.getCheckedOutLength() < b.getCheckOutLength();
}
void Library::incrementCurrentDate()
{
    currentDate++;

    for (int i = 0; i < members.size(); i++)
    {
        vector<Book*> ptr = members.at(i)->getCheckedOutBooks();
        Book dummy; //dummy used for find the fines 
        dummy.setCheckedOutLength(currentDate);
        int overdue = ptr.end() - upper_bound(ptr.begin(), ptr.end(), dummmy, checkedDateComparator);
        members.at(i)->amendFine(overdue* .01);
   }
}

score 1 · Answer 8 · answered Dec 02 '15 at 14:37

Let's take a step back and look at the requirements. When you go to the library and have some possibly-late checked-out books, you probably ask the librarian what you owe. The librarian looks up your account and tells you. It is at that point that you should be calculating the fees. What you're doing now is recalculating the fees every midnight (I'm assuming). That's the part that's inefficient.

Let's instead have this use-case:

Librarian attempts to check out a patron's books
System calculates fees
Patron pays any outstanding fees
Librarian checks out the books

The relevant part for your question would be step #2. Here's pseudocode:

float CalculateFees(Patron patron)
{
    float result = 0;
    foreach(checkedOutBook in patron.GetCheckedOutBooks())
    {
        result += CalculateFee(checkedOutBook.CheckOutDate(), today);
    }
    return result;
}

float CalculateFee(Date checkOutDate, Date today)
{
    return (today.Day() - checkOutDate.Day()) * 0.10;
}

The whole use-case could be as simple as:

void AttemptCheckout(Patron patron, BookList books)
{
    float fees = CalculateFees(patron);
    if(fees == 0 || (fees > 0 && PatronPaysFees(patron, fees)))
    {
        Checkout(patron, books);
    }
    else
    {
        RejectCheckout(patron);
    }
}

I've written this in a way that makes it easy to change the fee formula. Some types of materials accrue fines differently than other types of materials. Fines may be capped at a certain amount.

More efficient way to write this algorithm?

8 Answers8