Question 1:
What are the essential technical skills required?
Answer:
H.264 is a highly used video coding standard. The complete standard is a family of specifications covering a variety of encoding/decoding features, resolutions, and frame rates. So initially you need to understand the fundamentals of the standard. You need to understand how H.264 encoder and decoder works. May be this reference can help you to start with H.264 fundamentals from FPGA's point of view. You can also go through the H.264 codec explained.
Question 2:
What is the best way to implement this? As far as I know, we can write RTL-only codes or implement by HW/SW co-design.
Answer:
Although preferably one should write the main design called RTL in a hardware description language (HDL) namely Verilog or VHDL. Because most FPGA compilers expect to be given a design description in RTL form. RTL is an acronym for register transfer level. This means that your Verilog or VHDL code describes how data is transformed as it is passed from register to register.
However, it's plan wrong to say that you can only implement H.264 design (your H.264 RTL) in VHDL or Verilog. You can even write your H.264 design in C/C++ and use a compiler to generate your RTL in Verilog and VHDL. Below is code snippet of a simple H.264 decoder written in plain C that can be synthesized on almost any FPGA.
void decode_main(NALU_t* nalu,
StorablePicture pic[MAX_REFERENCE_PICTURES],
StorablePictureInfo pic_info[MAX_REFERENCE_PICTURES]) {
#pragma HLS INTERFACE ap_none register port=nalu->startcodeprefix_len
#pragma HLS RESOURCE core=AXI4LiteS variable=nalu->startcodeprefix_len
#pragma HLS INTERFACE ap_none register port=nalu->len
#pragma HLS RESOURCE core=AXI4LiteS variable=nalu->len
#pragma HLS INTERFACE ap_none register port=nalu->nal_unit_type
#pragma HLS RESOURCE core=AXI4LiteS variable=nalu->nal_unit_type
#pragma HLS INTERFACE ap_none register port=nalu->nal_reference_idc
// optimization pragmas continue//
extern seq_parameter_set_rbsp_t SPS_GLOBAL;
extern pic_parameter_set_rbsp_t PPS_GLOBAL;
extern ImageParameters img_inst;
extern slice_header_rbsp_t sliceHeader_inst;
extern char intra_pred_mode[PicWidthInMBs*4][FrameHeightInMbs*4];
// below rest of the code continues//
}
If you see, it has explicit compiler specific optimizations as HLS pragmas
. That actually means High-Level Synthesis (HLS) optimizations. On Stackoverflow (SO) seeking recommendations for books, tools, software libraries, and more is rightly not appreciated. Only to help you understand that you can still implement H.264 design apart from HDLs like Verilog or VHDL and since I have given you a brief explanation of my own, you can go through the complete design here for your further understanding.