You're only displaying the real parts of the output bins, and ignoring the imaginary components. It just so happens that the real parts match, but the imaginary components are different (they are actually complex conjugates):
#include <iostream>
#include <cmath>
#include "fftw3.h"
using namespace std;
int main()
{
int N=16;
fftwf_complex in[N], out[N];
fftwf_plan p1, q;
for (int i = 0; i < N; i++) {
in[i][0] = cos(3 * 2*M_PI*i/N);
in[i][1] = 0;
}
p1 = fftwf_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
fftwf_execute(p1);
for (int i = 0; i < N; i++)
cout << out[i][0] << " + j" << out[i][1] << endl; // <<<
fftwf_destroy_plan(p1);
printf("\nInverse transform:\n");
q = fftwf_plan_dft_1d(N, in, out, FFTW_BACKWARD, FFTW_ESTIMATE);
fftwf_execute(q);
for (int i = 0; i < N; i++)
cout << out[i][0] << " + j" << out[i][1] << endl; // <<<
fftwf_destroy_plan(q);
return 0;
}
Compile and run:
$ g++ -Wall fftwf.cpp -lfftw3f && ./a.out
3.67394e-16 + j0
1.19209e-07 + j7.34788e-16
-3.67394e-16 + j0
8 + j-7.34788e-16
3.67394e-16 + j0
2.38419e-07 + j7.34788e-16
-3.67394e-16 + j0
1.19209e-07 + j-7.34788e-16
3.67394e-16 + j0
1.19209e-07 + j7.34788e-16
-3.67394e-16 + j0
2.38419e-07 + j-7.34788e-16
3.67394e-16 + j0
8 + j7.34788e-16
-3.67394e-16 + j0
1.19209e-07 + j-7.34788e-16
Inverse transform:
3.67394e-16 + j0
1.19209e-07 + j-7.34788e-16
-3.67394e-16 + j0
8 + j7.34788e-16
3.67394e-16 + j0
2.38419e-07 + j-7.34788e-16
-3.67394e-16 + j0
1.19209e-07 + j7.34788e-16
3.67394e-16 + j0
1.19209e-07 + j-7.34788e-16
-3.67394e-16 + j0
2.38419e-07 + j7.34788e-16
3.67394e-16 + j0
8 + j-7.34788e-16
-3.67394e-16 + j0
1.19209e-07 + j7.34788e-16
It's interesting to note that the FFT and IFFT are mathematically almost identical. They are often both implemented as a single routine, with a flag indicating direction (forward or inverse). Typically this flag just affects the sign of the imaginary part of the twiddle factors.