3

I am using EventHandler to create page header for my pdf. The content of the header are added into a Table before adding to Canvas. As part of 508 compliance, i need to exclude the header content from being read out loud. How do i accomplice this?

public class TEirHeaderEventHandler : IEventHandler 
{
    public void HandleEvent(Event e)
    {
        PdfDocumentEvent docEvent = (PdfDocumentEvent)e;
        PdfDocument pdf = docEvent.GetDocument();
        PdfPage page = docEvent.GetPage();

        PdfCanvas headerPdfCanvas = new PdfCanvas(page.NewContentStreamBefore(), page.GetResources(), pdf);
        Rectangle headerRect = new Rectangle(60, 725, 495, 96);
        Canvas headerCanvas = new Canvas(headerPdfCanvas, pdf, headerRect);

        //creating content for header
        CreateHeaderContent(headerCanvas);
        headerCanvas.Close();
    }

    private void CreateHeaderContent(Canvas canvas)
    {
        //Create header content
        Table table = new Table(UnitValue.CreatePercentArray(new float[] { 60, 25, 15 } ));
        table.SetWidth(UnitValue.CreatePercentValue(100));

        Cell cell1 = new Cell().Add(new Paragraph("Establishment Inspection Report").SetBold().SetTextAlignment(TextAlignment.LEFT));
        cell1.SetBorder(Border.NO_BORDER);
        table.AddCell(cell1);

        Cell cell2 = new Cell().Add(new Paragraph("FEI Number:").SetBold().SetTextAlignment(TextAlignment.RIGHT));
        cell2.SetBorder(Border.NO_BORDER);
        table.AddCell(cell2);

        Cell cell3 = new Cell().Add(new Paragraph(_feiNum).SetBold().SetTextAlignment(TextAlignment.RIGHT));
        cell3.SetBorder(Border.NO_BORDER);
        table.AddCell(cell3);

        canvas.Add(table);
    }
}

public static void CreatePdf()
{
    using (MemoryStream writeStream = new MemoryStream())
    using (FileStream inputHtmlStream = File.OpenRead(inputHtmlFile))
    {
        PdfDocument pdf = new PdfDocument(new PdfWriter(writeStream));
        pdf.SetTagged();

        iTextDocument document = new iTextDocument(pdf);           

        TEirHeaderEventHandler teirEvent = new TEirHeaderEventHandler();
        pdf.AddEventHandler(PdfDocumentEvent.START_PAGE, teirEvent);


        //Convert html to pdf
        HtmlConverter.ConvertToDocument(inputHtmlStream, pdf, properties);

        document.Close();

        byte[] bytes = TEirReorderingPages(writeStream, numOfPages);

        File.WriteAllBytes(outputPdfFile, bytes);
    }
}

Note that i have set the document to be tagged. but i still get the "Reading Untagged Document" screen when i open the file. However, all of the content are read including the header when i activate the Read Out Loud feature. Any input or suggestion would be appreciated. Thank you in advance for your help.

slacker
  • 53
  • 8

2 Answers2

2

General

The approach suggested by Alexey Subach is generally correct. You mark the content as artifact to differentiate it from real content.

element.getAccessibilityProperties().setRole(StandardRoles.ARTIFACT);

This marks the content in the content stream and it excludes the element from the structure tree.

Your case

However, your specific case is more nuanced.

For a well tagged PDF document, the proper way to read it out loud is to process the structure tree, which is a data structure that represents the logical reading order of the (semantic) elements of the document, such as paragraphs, tables and lists.

Because of the way you are creating the header content, it is not automatically tagged: a Canvas instance that is created from a PdfCanvas instance has autotagging disabled by default. So the table in the header is not marked in the content stream and it is not included in the structure tree. Marking it explicitly as an artifact, with the approach described above in General, should not make a significant difference because it was not in the structure tree to begin with.

If you enable autotagging by adding headerCanvas.enableAutoTagging(page), you will notice that the table does appear in the structure tree.

If you then add table.getAccessibilityProperties().setRole(StandardRoles.ARTIFACT), the table is excluded from the structure tree again.

Summary: looking at the structure tree, there's no difference between your original code and the approach of General.

Adobe reading order / accessibility settings

From your description, I think you are using Adobe Acrobat or Reader for the read out loud functionality. Under Preferences > Reading > Reading Order Options, you can configure how the content should be processed for the read out loud feature:

Adobe Reader reading order options

From https://helpx.adobe.com/reader/using/accessibility-features.html:

  • Infer Reading Order From Document (Recommended): Interprets the reading order of untagged documents by using an advanced method of structure inference layout analysis.
  • Left-To-Right, Top-To-Bottom Reading Order: Delivers the text according to its placement on the page, reading from left to right and then top to bottom. This method is faster than Infer Reading Order From Document. This method analyzes text only; form fields are ignored and tables aren’t recognized as such.
  • Override The Reading Order In Tagged Documents: Uses the reading order specified in the Reading preferences instead what the tag structure of the document specifies. Use this preference only when you encounter problems in poorly tagged PDFs.

In my tests, the only way I can make Adobe Reader read out loud the header content created with your original code, is when I select Left-To-Right, Top-To-Bottom Reading Order and enable Override The Reading Order In Tagged Documents. In that case, it is basically ignoring the tagging and just processing the content per the location on the page.

With Override The Reading Order In Tagged Documents disabled, the header content is not read, for your original code and with explicit artifacts.

Conclusion

Although it's a good idea to always tag artifacts as such, so they can be properly differentiated from real content, in this case I believe the behaviour you're experiencing is more related to application configuration than to file structure.

rhens
  • 4,791
  • 3
  • 22
  • 38
1

Headers and footers are typically pagination artifacts and should be marked as such in the following way:

table.getAccessibilityProperties().setRole(StandardRoles.ARTIFACT);

This will exclude the table from being read. Please note that you can mark any element implementing IAccessibleElement interface as artifact.

Alexey Subach
  • 11,903
  • 7
  • 34
  • 60