According to the comments to the question, the OP refers to PDFBox 2.0.x versions, in particular 2.0.6.
getKids()
The method getKids()
is undefined for the type PDField
In PDFBox 2.0.6 there are two immediate sub-classes of PDField
. Different variants of the former (1.8.x) getKids()
method are implemented in there:
PDNonTerminalField
- the method retrieving the kids in this class is getChildren()
and returns a List<PDField>
, a list of form fields.
PDTerminalField
- the method retrieving the kids in this class is getWidgets
and returns a List<PDAnnotationWidget>
, a list of widget annotations.
name of the parent, followed by .null
When there are multiple copies with the same field name, the getFullyQualifiedName
for each kid in the list of PDField
objects returns the name of the parent, followed by .null
This is not the case in PDFBox 2.0.x.
In the sample document attached to the PDFBox issue PDFBOX-2148 PDFBox now correctly finds only a single field which appropriately is named "Button2". This field is a PDTerminalField
and has 4 widget annotations. The class of the latter, PDAnnotationWidget
, has no getFullyQualifiedName
method, so there are no ".null" names.
Thus, this problem is gone.
FQN of duplicate fields
(from the OP's comment responding to "What exactly is your question?")
how to get Fully Qualified Name of duplicate fields in pdfbox
There are no duplicate fields in (valid) PDFs, for a given name there is at most a single field which may have multiple widgets. Widgets do not have individual FQNs.
Thus, what you call "duplicate fields" in your example document actually is a single field with multiple widgets; the name of that field is "Button2" and can be retrieved using getFullyQualifiedName()
.
which page which form field
(from the OP's comments to this answer)
but how to get current page no in pdfbox.. for example there are 3 page and in page 2 there is a form field so how can i get which page which form field ?
All PDAnnotation
classes, among them PDAnnotationWidget
, have a getPage()
method returning a PDPage
instance.
BUT: As specified in ISO 32000-1, annotations (in particular form field widgets) are not required to have a link to the page on which they are drawn (except for screen annotations associated with rendition actions).
Thus, the above mentioned method getPage()
may return null
(probably more often than not).
So to determine the respective pages of your widgets, you have to approach the problem the other way around: Iterate over all pages and look for the annotation widgets in the respective annotation array.
For PDFBox 1.8.x you can find example code in this stackoverflow answer. With the information given in the previous parts of this answer it should be easy to port the code to PDFBox 2.0.x.
checkbox and radio button
(also from the OP's comments to this answer)
one more issue if i am using checkbox and radio button both then field.getFieldType() output is Btn for both. how to identify it?
You can identify them by inspecting the field flags which you retrieve via fields.getFieldFlags()
:
- If the Pushbutton flag is set (
PDButton.FLAG_PUSHBUTTON
), the field is a regular push button.
- Otherwise, if the Radio flag is set (
FLAG_RADIO
), the field is a radio button.
- Otherwise, the field is a check box.
Alternatively you can check the class of the field
object which for Btn may be PDPushButton
, PDRadioButton
, or PDCheckBox
.
Beware: If a check box field has multiple widgets with differently named on states, this check box field and its widgets act like a radio button group! And not only in theory, I've seen PDFs with such check box fields in the wild.
To really be sure concerning the behavior of the fields, you therefore also should compare the names of the on states of all the widgets of a given check box.