The answer to your question involves a time frequency trade-off you will have to decide on. The smaller slice of time you analyze to get a smaller time uncertainty window, the coarser the frequency accuracy. And vice-versa. If you want an exact frequency, then time window required and thus the time uncertainty could become infinitely large.
If you know what frequency band and bandwidth in which you are interested, you could try filtering out that band and looking at the amplitude envelope which might have a starting rise and falling decay. If you know the exact shape of the envelope of the sound of interest, then convolution against a matched filter might give you a peak correlation point in time.