Background
I have a neural network that outputs key points for pose (feet, ankles, knees, arms, head, etc.) and the connections - basically I've got a skeleton. I'd like to use these key points /skeleton as an input to another neural network - a relation network (https://arxiv.org/pdf/1706.01427.pdf). The goal is to learn relationships between pose and different objects.
Question
Since I'm working with key points, I'm not sure what the best way to structure them is as an input. I've considered converting the key points to an image where at every X/Y location the value is 0 unless it's covered by the skeleton where the value is set to 1. But that seems inefficient. Is there a way to retain the structural benefits of using images (for which I can use convolutional nets), without the hit on performance?