Decoding can be executed by the next code:
MFT_OUTPUT_DATA_BUFFER loutputDataBuffer;
initOutputDataBuffer(
lTransform,
loutputDataBuffer);
DWORD lprocessOutputStatus = 0;
lresult = lTransform->ProcessOutput(
0,
1,
&loutputDataBuffer,
&lprocessOutputStatus);
if ((HRESULT)lresult == E_FAIL)
{
break;
}
function initOutputDataBuffer allocates the needed memory. Example of that function is presented there:
Result initOutputDataBuffer(IMFTransform* aPtrTransform,
MFT_OUTPUT_DATA_BUFFER& aRefOutputBuffer)
{
Result lresult;
MFT_OUTPUT_STREAM_INFO loutputStreamInfo;
DWORD loutputStreamId = 0;
CComPtrCustom<IMFSample> lOutputSample;
CComPtrCustom<IMFMediaBuffer> lMediaBuffer;
do
{
if (aPtrTransform == nullptr)
{
lresult = E_POINTER;
break;
}
ZeroMemory(&loutputStreamInfo, sizeof(loutputStreamInfo));
ZeroMemory(&aRefOutputBuffer, sizeof(aRefOutputBuffer));
lresult = aPtrTransform->GetOutputStreamInfo(loutputStreamId, &loutputStreamInfo);
if (lresult)
{
break;
}
if ((loutputStreamInfo.dwFlags & MFT_OUTPUT_STREAM_PROVIDES_SAMPLES) == 0 &&
(loutputStreamInfo.dwFlags & MFT_OUTPUT_STREAM_CAN_PROVIDE_SAMPLES) == 0)
{
lresult = MFCreateSample(&lOutputSample);
if (lresult)
{
break;
}
lresult = MFCreateMemoryBuffer(loutputStreamInfo.cbSize, &lMediaBuffer);
if (lresult)
{
break;
}
lresult = lOutputSample->AddBuffer(lMediaBuffer);
if (lresult)
{
break;
}
aRefOutputBuffer.pSample = lOutputSample.Detach();
}
else
{
lresult = S_OK;
}
aRefOutputBuffer.dwStreamID = loutputStreamId;
} while (false);
return lresult;
}
It needs get information about output samples via GetOutputStreamInfo method of IMFTransform. MFT_OUTPUT_STREAM_INFO contains info about the needed size of memory for output media sample - cbSize. It needs to allocate memory with that size, adds it into the MediaSample and attaches it to th MFT_OUTPUT_DATA_BUFFER.
So, you see that writing code for encoding and decoding video via direct calling of the MediaFoundation function can be difficult and needs significant knowledge about it. From description of you task I see that you need only decode video and present it. I can advise you try use Media Foundation Session functionality. It is developed by engineers of Microsoft and already includes algorithms for using of the needed encoders and optimized. In project videoInput Media Foundation Session is used for finding the suitable decoder for Media Source which is created for web camera and grabbing of the frames in uncompressed format. It is already do the needed processing. You need only replace Media Source from web camera on Media Source from video file. It could by more easy then writing code with direct calling of IMFTransform for decoding and allows to simplify many problems (for example - stabilizing of frame rate. If code will render image immediately after decoding and then decode new frame then it can render 1 minutes video clip during a couple seconds, or if rendering of video and other content can take more than one frame duration video can be presented in "Slow motion" style and rendering of the 1 minute video clip can take 2, 3 or 5 minutes.
I do not know for what project you need decoding of video, but you should have serious reasons for using code with direct calling of the Media Foundation functions and interfaces.
Regards.