So I'm trying to translate morsecode signals to their String representation. Some forms of preprocessing yield one dimensional arrays of normalized floats from [0, 1] that serve as input to a C/RNN. Example:
This image is stretched along the y-axis for better visibility, but the inputs to the NN are 1d. I'm looking for a smart way to translate the contens of the image, in this example the correct translation would be "WPM = TEXT I". My current model uses keras' ctc loss as in this tutorial. It will however detect the letter "E" for every single timestep ("E" is the morse equivalent of "." or a small bar in the image), so I figure that the "stepsize" is too small. This is reinforced by another attempt that classifies every timestep above some threshold as "E" and everything else als [UNK]/blank.
I think the main problem is the vast difference in size between for example an "E" (one thin line) and other characters, for example "=", represented by the small lines, framed by two thick ones as seen in the middle (-...-). This should be less of a problem in voice recognition, because there you can make phonetic sense of time-segments as small as microseconds (like hearing an "i" sound in "thin" and "gym") which is not possible in this context.
Perhaps anyone comes up with a smart solution, either to this implementation, or maybe through a different representation of inputs or something along those lines.