I am creating the accessibility PDF from the tagged pdf. It shows a "path object is not tagged" error. The PDF has lines and underlined text. So, I am trying to add an "ARTIFACT" tag for the untagged line items. I am able to get the lines from PDFGraphicsStreamEngine
. Could anyone help me with this?
-
Can you share a representative example PDF? – mkl Aug 26 '21 at 06:40
-
Hi, I have added the sample PDF here, https://drive.google.com/file/d/1Z1R-SIalxPzAHH57_Qs0zGPDV3rjoqtN/view – Dilli Aug 26 '21 at 07:55
1 Answers
You can use the PdfContentStreamEditor
class from this answer to edit the page content streams as desired by customizing and calling it like this:
PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
PdfContentStreamEditor markEditor = new PdfContentStreamEditor(document, page) {
int markedContentDepth = 0;
@Override
public void beginMarkedContentSequence(COSName tag, COSDictionary properties) {
if (inArtifact) {
System.err.println("Structural error in content stream: Path not properly closed by path painting instruction.");
}
markedContentDepth++;
super.beginMarkedContentSequence(tag, properties);
}
@Override
public void endMarkedContentSequence() {
markedContentDepth--;
super.endMarkedContentSequence();
}
boolean inArtifact = false;
@Override
protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
String operatorString = operator.getName();
boolean unmarked = markedContentDepth == 0;
boolean inArtifactBefore = inArtifact;
if (unmarked && (!inArtifactBefore) && PATH_CONSTRUCTION.contains(operatorString)) {
super.write(contentStreamWriter, Operator.getOperator("BMC"), Collections.singletonList(COSName.ARTIFACT));
inArtifact = true;
}
super.write(contentStreamWriter, operator, operands);
if (unmarked && inArtifactBefore && PATH_PAINTING.contains(operatorString)) {
super.write(contentStreamWriter, Operator.getOperator("EMC"), Collections.emptyList());
inArtifact = false;
}
}
final List<String> PATH_CONSTRUCTION = Arrays.asList("m", "l", "c", "v", "y", "h", "re");
final List<String> PATH_PAINTING = Arrays.asList("s", "S", "f", "F", "f*", "B", "B*", "b", "b*", "n");
};
markEditor.processPage(page);
}
document.save(...);
(EditMarkedContent test testMarkUnmarkedPathsAsArtifactsTradeSimple1
)
The beginMarkedContentSequence
and endMarkedContentSequence
overrides track the current marked content nesting depth, in particular whether or not the current content is marked at all.
For yet unmarked instructions the write
override then encloses unmarked path construction and painting instruction sequences in /Artifact BMC ... EMC
.
Beware, this code only considers content in page content streams, it does not descend into form XObjects, Patterns, etc.
Furthermore, in case of content streams with errors (e.g. with path construction without painting) this code may add additional errors (e.g. unbalanced marked content starts and ends).

- 90,588
- 15
- 125
- 265
-
It shows error in 'Collections.singletonList(COSName.ARTIFACT)' and 'Collections.emptyList()'. PdfContentStreamEditor has "write" function with 'List
' argument. – Dilli Aug 26 '21 at 15:35 -
It doesn't here. I'll add a link to the full source code, maybe you imported a different `Collections` class. It's [EditMarkedContent.java](https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/content/EditMarkedContent.java#L46). – mkl Aug 26 '21 at 15:56
-
Still the error shows - The method write(ContentStreamWriter, Operator, List
) in the type PdfContentStreamEditor is not applicable for the arguments (ContentStreamWriter, Operator, List – Dilli Aug 26 '21 at 16:49) -
Which java version do you use? Your version appears to have less flexible generics resolution. You can try using `Collections.
singletonList(COSName.ARTIFACT)` and `Collections. – mkl Aug 27 '21 at 10:34emptyList()` respectively. -
It is working fine now. As you said being used the above lines. I am using java 11.0.11. Thanks a lot. May I know which version you are using and works without any issue. – Dilli Aug 28 '21 at 04:19
-
Hhmm. My pdfbox playground project was still on Java 1.8. But even after switching to Java 11 the implicit template classes still worked. Admittedly, though, the installed Java 11 is a 11.0.9. But should there be such differences? – mkl Aug 28 '21 at 07:42