It can't really be the other way around because that is what is in the standard. Obviously, you are free to implement your own use of the radio but then I guess it wouldn't be 802.15.4!
The designers of the standard probably had good reason to place the CAP before the CFP (and if you are really interested I imagine it will be documented somewhere in the IEEE meeting minutes etc). My guess is that I think it would have these following benefits:
- devices have to wake up their receiver to listen for the beacon frame, and thus if they have any ad-hoc comms to perform (like collecting a pending message or negotiating a connection etc) they can do it straight away and then go to sleep for the rest of the superframe
- having the CAP first allows any devices that do not have a GTS to power down their radio for as long as possible
- having the CAP first provides time for devices to negotiate a GTS before the CFP starts, thus reducing the latency to their first GTS (i.e. it would be possible to hear a beacon, associate, and obtain a GTS prior to the very next CFP)