Below quote from here clears things up:
The method for generating configuration cycles is host dependent. In
IA machines, special I/O ports are used. On other platforms, the PCI
configuration space can be memory-mapped to certain address locations
corresponding to the PCI host bridge in the host address domain.
And
I/O space can be accessed differently on different platforms.
Processors with special I/O instructions, like the Intel processor
family, access the I/O space with in and out instructions. Machines
without special I/O instructions will map to the address locations
corresponding to the PCI host bridge in the host address domain. When
the processor accesses the memory-mapped addresses, an I/O request
will be sent to the PCI host bridge, which then translates the
addresses into I/O cycles and puts them on the PCI bus.
So for non-IA platform, MMIO can just be used instead. And the platform specs should document that memory-mapped address for the PCI host bridge as the a priori knowledge for SW/FW writers.
ADD 1 - 14:36 2023/2/5
From the digital design's perspective, the host CPU and the PCIe subsystem are just two separate IP blocks. And the communication between them is achieved by a bunch of digital signals in the form of address/data/control lines. As long as the signals can be conveyed, the communication can be made.
For x86 CPUs, the memory address space and IO address space are just different usage of address lines down to the earth. I don't think there's any strong reason that memory addresses cannot be used to communicate with PCIe subsystem. I think it's a more logical choice back then to use I/O addresses for PCIe because PCIe is deemed as I/O.
So the real critical thing I think, is to convey the digital signals in proper format between IPs. PCIe is independent of CPU architectures and cares nothing about what lines to be used. For ARM, there's nothing unnatural to use memory addresses, i.e., MMIO. After all it's digital signals and are capable of passing necessary information properly.