I want to build a neural network to classify splice junctions in DNA sequences in Python. Right now, I just have my data in strings (for example, "GTAACTGC").
I am wondering about the best way to encode this in a way that I can process with a neural network. My first thought was to just assign each letter to an integer value, but that feels pretty naive. Another thought was to have four binary indicators for each place corresponding to either A, T, G, or C. I'm not sure how well that would work, either.
What is the best way to approach this problem? Before, I have just been working with numerical values, but have never worked with strings like DNA sequences before.
Edit: For now, I will be looking at just mapping. For anyone reading this, try looking at this paper; it definitely helped give me some pointers.