Rather than use FFTW you might want to consider using OpenCV, which is a higher level API for computer vision and image processing in general, and is much easier to use than building your own routines from low level building blocks such as FFTs. OpenCV already has e.g. a template matching function cvMatchTemplate, and it can use efficient FFT implementations "under the hood" where needed so performance should not be a problem.
If you really do have to use FFTW then be prepared for some extensive reading of documentation and an initially steep learning curve. The steps for cross correlation (which I am assuming is what you want to use for your template matching) are typically:
- create forward/reverse FFT plans for the larger image size
- do FFT of target image using forward FFT plan
- pad template image to size of target image with zeroes
- do FFT of padded template image using forward FFT plan
- take complex conjugate of padded image template FFT output
- multiply target image FFT output by complex conjugate of padded image template FFT output
- take IFFT of product using reverse FFT plan
You can then examine the result for one or more peak values, which should correspond to the location(s) of your template image within the target image.
Note that for better results you should consider using normalized cross correlation, but this is rather more complex to implement in the frequency domain.