What gets discarded by gcc's -flto?

Question

I'm building our firmware for stm32 with arm-none-eabi-gcc 6.3.1.

If I enable link-time optimization, it still compiles and boots and is ~10kiB smaller than without -ftlo but there is some subtle breakage.

How can I debug this?

Is there a way to get gcc to tell me what it (wrongly) discards during link-time optimization?

LTO is pretty complex, as it can involve cross-module inlining and other fancy things. It's not just function discarding. You might be better off just troubleshooting the breakage. — Jonathon Reinhart, May 04 '18 at 12:42
I stopped using the LTO as it was giving me too much headache. Those errors are very difficult to debug and it takes too much time. Maybe version 7x works better. As I know there are issues in the version 8x as well but I have stopped using LTO. Maybe I will come back to it when it will be a bit more mature and well tested — 0___________, May 05 '18 at 00:32
PeterJ_01: I use from 5.x LTO almost everywhere, also on more complex projects and actually without any issue. Yes, if I need debug I turn it OFF. — vlk, May 05 '18 at 11:10
user1273684: try turn on all warnings, also -Wall -pedantic -Wextra probably compiler can show something. — vlk, May 05 '18 at 11:12

score 3 · Answer 1 · answered May 09 '18 at 05:08

Timing issues

Optimizing code should and will make it run faster, which can cause issues with hardware that is expecting it a bit slower.

An example:

void GPIO_Test() {
    GPIO_InitTypeDef GPIO_InitStruct;
    RCC->AHBENR |= RCC_AHBENR_GPIOBEN;
    HAL_GPIO_WritePin(GPIOB, GPIO_PIN_6 | GPIO_PIN_7, GPIO_PIN_SET);
    GPIO_InitStruct.Pin = GPIO_PIN_6 | GPIO_PIN_7;
    GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;
    GPIO_InitStruct.Pull = GPIO_NOPULL;
    GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_LOW;
    HAL_GPIO_Init(GPIOB, &GPIO_InitStruct);
}

This is working without -lto, but fails to set the outputs high when -lto is enabled. Why? Because on most STM32 models, a small delay is needed between enabling the clock in RCC and using the peripheral (this is mentioned in the errata). Calling a function would provide the required delay, but with -lto, the compiler can inline functions in another modules, reducing the delay.

Missing volatile

A common source of problems with -lto is that it can optimize away accesses of variables that should have been declared as volatile, but aren't, even if the access is encapsulated in a function call in another module.

Let's see simple example.

mainloop.c:

while(1) {
  if(button_pressed()) {
    do_stuff();
  }
}

button.c:

int button_flag;
void button_interrupt_handler() {
    button_flag = GPIOx->IDR & SOME_BIT;
}

void button_pressed() {
  return button_flag;
}

Without -lto, calling a function in another module is treated like a black box with possible side effects, the call is always generated, and the result is always evaluated. In other words, each function call to another module acts as an implicit memory barrier. With -lto the barrier is no longer there, the compiler can effectively inline or otherwise optimize functions in other modules.

All examples you have shown are a sign of a bad programming. Not using the barrier or explicit delay when it is required by the uC documentation, volatile abuse etc etc. It is not related to the lto but the lack of the programing knowledge, But current implementation LTO is still not very good anyway. — 0___________, May 09 '18 at 09:00
@PeterJ_01 exactly my point. The majority of suspected compiler (library, hardware etc) flaws are actually problems between the keyboard and the chair. BTW if you have reproducible examples of problems with LTO, why don't you post them as an answer? It'd make a better answer than mine. — followed Monica to Codidact, May 10 '18 at 10:27

What gets discarded by gcc's -flto?

1 Answers1