I don't have a quick answer, but I've spent the last two weeks solving this exact problem, with success. I used Apache PDFBox, which extracts PDF text to TextPositions. These TextPositions contain information about each character in the text (position, bold, italic, font, etc). I used this information to set up bounding boxes for all of the table elements and decifer things like text-alignment, column membership, etc, and then recreate the PDF page and it's tables in Excel, in just under 1000 lines of code.
I did not have to extract graphic elements like checkboxes, but Apache PDFBox does extract to COSStreams, and graphic and form elements can likely be parsed from those streams - I'm not there yet. My code would be able to rebuild the table you showed and would only be missing the checkboxes and background colors.
I've searched for a simpler solution than mine and came up short, it seems there's no easy way to do this.
EDIT: If this hasn't dissuaded you, I can show you how to begin. First, extend either PDFTextStripper or PDFTextStripperByArea. This gives you access to the TextPositions via the processTextPosition override - the following code shows how I transformed TextPositions into my own custom class TextChar. I then use relative textpositions to work out rudimentary contextual information:
public class PDFStripper : PDFTextStripper
{
private List<TextChar>[] tcPages;
public PDFStripper(java.util.List pages)
{
int pagecount = pages.size();
tcPages = new List<TextChar>[pagecount+1];
base.processPages(pages);
}
protected override void processTextPosition(TextPosition tp)
{
PDGraphicsState gs = getGraphicsState();
TextChar tc = BuildTextChar(tp, gs);
int currentPageNo = getCurrentPageNo();
if (tcPages.ElementAtOrDefault(currentPageNo) == null)
{
tcPages[currentPageNo] = new List<TextChar>();
}
tcPages[currentPageNo].Add(tc);
}
private static TextChar BuildTextChar(TextPosition tp, PDGraphicsState gstate)
{
TextChar tc = new TextChar();
tc.Char = tp.getCharacter()[0];
float h = (float)Math.Floor(tp.getHeightDir());
tc.Box = new RectangleF
(
tp.getXDirAdj(),
(float)Math.Round(tp.getYDirAdj(), 0, MidpointRounding.ToEven) - h, // adjusted Y to top
tp.getWidthDirAdj(),
h
);
tc.Direction = tp.getDir();
tc.SpaceWidth = tp.getWidthOfSpace();
tc.Font = tp.getFont().getBaseFont();
tc.FontSize = tp.getFontSizeInPt();
try
{
int[] flags =
GetBits(tp.getFont().getFontDescriptor().getFlags());
tc.IsBold = findBold(tp, flags, gstate);
tc.IsItalic = findItalics(tp, flags);
}
catch { }
return tc;
}
protected override void writePage() { return; } //prevents exception
}