1

Recently, I'm implement prefetch algorithm with gem5, and I want to warmup the system for a while before counting the performance. I try to use the -W WARMUP_INSTS paramenter to warmup my system with 1M instructions, but there is no difference compared with no warmup. se.py's help shows "Warmup period in total instructions (requires --standard-switch)", and I add --standard-switch in my command line, the result shows difference, the cmd is shown below.

build/${arch}/gem5.debug configs/example/se.py --cpu-type DerivO3CPU --cmd=${cmd} -o "${options}" --l1d_size=64kB --l1i_size=64kB --l2_size=512kB --caches --l2cache -n 1 --l1d-hwp-type=${PF} -I 4000000 -W 1000000 -s 100000

That's how i understand it: The system run first 100k instructions with TimingSimpleCPU for warmup, and then run the next (400k - 100k) instructions for evaluation, and the warmup is only effective when combining it with --standard-switch.

And I also test the cmd with only --standard-switch, it show different results, I'm confused again.

Can anyone tell me how to configure the command line to warmup the system in gem5's se mode?

I run different configurations, and list the cpi results here:

# (1) without --warmup-instruction && without --standard-switch
system.cpu.cpi                               8.603255                       # CPI: Cycles Per Instruction ((Cycle/Count))
# (2) without --warmup-instruction && with --standard-switch
system.cpu.cpi                             181.000000                       # CPI: Cycles Per Instruction ((Cycle/Count))
system.switch_cpus_1.cpi                     8.586554                       # CPI: Cycles Per Instruction ((Cycle/Count))
# (3) with warmup-instruction && without --standard-switch
system.cpu.cpi                               8.603255                       # CPI: Cycles Per Instruction ((Cycle/Count))
# (4) with warmup-instruction && with --standard-switch
system.cpu.cpi                             181.000000                       # CPI: Cycles Per Instruction ((Cycle/Count))
system.switch_cpus_1.cpi                     8.939483                       # CPI: Cycles Per Instruction ((Cycle/Count))

1 Answers1

0

As far as I know, --standard-switch is used to configure a generic memory layout (three cache levels) and this is have nothing to do with warming up your prefetcher. With —-standard-switch, you will improve your overall performance due to L3 cache use. If you are getting the same CPI with/without warmups, it possibly means that your prefetching algorithm works just fine without the need to fill up some structures/tables to be used next to predict prefetches. For example, running next line will get you the same results with/without running some warmup instructions. Warmup is needed when you need to develop confidence threshold at which your prefetcher is potentially capable of generating prefetches that have good chances to be requested by the CPU. If you could provide more details about the algorithm you are trying to implement, I would be more than happy to help.