Timing issues
Optimizing code should and will make it run faster, which can cause issues with hardware that is expecting it a bit slower.
An example:
void GPIO_Test() {
GPIO_InitTypeDef GPIO_InitStruct;
RCC->AHBENR |= RCC_AHBENR_GPIOBEN;
HAL_GPIO_WritePin(GPIOB, GPIO_PIN_6 | GPIO_PIN_7, GPIO_PIN_SET);
GPIO_InitStruct.Pin = GPIO_PIN_6 | GPIO_PIN_7;
GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;
GPIO_InitStruct.Pull = GPIO_NOPULL;
GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_LOW;
HAL_GPIO_Init(GPIOB, &GPIO_InitStruct);
}
This is working without -lto
, but fails to set the outputs high when -lto
is enabled. Why? Because on most STM32 models, a small delay is needed between enabling the clock in RCC and using the peripheral (this is mentioned in the errata). Calling a function would provide the required delay, but with -lto
, the compiler can inline functions in another modules, reducing the delay.
Missing volatile
A common source of problems with -lto
is that it can optimize away accesses of variables that should have been declared as volatile
, but aren't, even if the access is encapsulated in a function call in another module.
Let's see simple example.
mainloop.c:
while(1) {
if(button_pressed()) {
do_stuff();
}
}
button.c:
int button_flag;
void button_interrupt_handler() {
button_flag = GPIOx->IDR & SOME_BIT;
}
void button_pressed() {
return button_flag;
}
Without -lto
, calling a function in another module is treated like a black box with possible side effects, the call is always generated, and the result is always evaluated. In other words, each function call to another module acts as an implicit memory barrier. With -lto
the barrier is no longer there, the compiler can effectively inline or otherwise optimize functions in other modules.