Interesting question!
One idea would be to choose an arbitrary but consistent order for each label you provide. Then, just train a normal object keypoint detector with four keypoints. I personally like the model zoo in mmpose for this.
Here's one idea for what I mean by a consistent but arbitrary order. Let's say point 1 is always a corner on the shorter edge of the card so that if you label in clockwise order the next corner will also be on the shorter edge. To break the ambiguity, let's say we also choose point 1 to be the highest of the two corners with this property. Then, we label in clockwise order. Here's a picture of what I mean:

I think this should work okay because the majority of keypoint detection algorithms are not rotation equivariant (to get rotational equivariance, you usually have to do data augmentation), and the network will view these data points as very different from each other. If you choose to do rotational data augmentation, you'll need to be smart about reordering your keypoints when you do this so that the ordering abides the rules I described.