I have read (on Wikipedia) that loop unrolling can cause instruction cache misses but I don't understand how. From my understanding, if the loop is unrolled or not, it will still execute the same instructions with just the difference that the unrolled loop will have fewer loop overhead calls but how does it effect the instruction cache?
I could not find a clear answer about it. There was an answer about it on another StackOverflow question but it didn't provide a complete answer: How can a program's size increase the rate of cache misses?