The exact phrasing suggests you are being asked to synthesize, ie create a new signal, not filter, or modify an existing signal. Moreover it asks about a fundamental frequency of 150 Hz (It uses the word pitch and not frequency. I'm assuming that fundamental frequency is good enough and/or what they meant :).
So, let me try rewording the question for you:
Do the following for each vowel sound (A, E, I, O, U, etc):
Create a 5 second sound with a fundamental frequency of 150 Hz.
I can think of two ways to solve this problem: 1. sum up some sine waves (all of which will be a multiple of 150 Hz) at different intensities. Knowing the intensities is the trick here. or 2. Start with a pulse of 150 Hz and filter it. Knowing the exact filter to use is the trick here, although using the right pulse will probably have some impact as well. Either way, you don't need or want an FFT in the generation stage. If you can't or don't want to look up the unknowns above, you could use an FFT to analyze a real person saying those sounds and use the results of the analysis to fill in the gaps. It wouldn't be too hard to do that, but it's probably covered in an advanced textbook on phonetics and/or acoustics.
If you need a more detailed answer, perhaps you should create a new question and link it here for help answering that. I suggest the following tags, if they exist:
- Speech synthesis
- Filtering
- audio
- phonetics