I found two libraries (librosa and tarsosDSP), which are meant for audio signal processing. They both have a method to extract mfcc.
After running a simple example on the same .wav
file they give quite different results:
Blue comes from librosa, orange from tarsosDSP. with y=x * -3/5
the orange line is almost exactly as the blue one.
What is the reason they are so different? I used exactly the example code, so I think the reason not that I call them with different inputs, but how they compute the results internally.