by Michael Brauer (firstname.lastname@example.org)
last modification on
This paper contains a proposal how StarOffice/OpenOffice.org Writer (and Writer/Web) documents can be made accessible by using the UNO Accessibility API (UAA).
As stated in the guidelines of document representation, the accessible objects tree for Writer documents represents the current view of the document as it does for any other application, too.
It is obvious that the most important accessible objects are the ones that contain the document's text, or to be more precise, support the AccessibleEditableText service. A real difficulty however results from the fact that the text that is visible on the screen might in fact be part of very different parts of the document. If a document for instance contains headers and footers for pages, text from a footer, a header and text from the the body region of two different pages might be visible simultaneously. And things might become more complex if columns, text-boxes and footnotes get involved.
Beside text that might be contained in very different objects, the view of a Writer document might also contain tables, images, drawings and OLE objects.
This draft first gives an overview of the accessible object tree for Writer documents, followed by some explanations what reasons led to the structure. After that, a detailed specification follows.
The root accessible object (i.e. the one of the window that contains view of the Writer document) has a child object for
the body text of the document (i.e. the text that is distributed to the pages of a text document), except for the very rare cases where no body text is visible at all;
every page header and every page footer that is visible;
every footnote and endnote that is visible; and
every text-box, image, OLE object and drawing that is visible;
Teseobjects are called area objects within this proposal.
There is neither an object for pages nor for columns. This especially means that there is exactly one object for body text that is visible currently, even if the text appears on different pages.
With the exception of text-boxes, images, OLE objects and drawings the only service area objects support is AccessibleContext. This especially means that there is no AccessibleComponent service available and therefor no geometrical information.
Text-boxes, images, OLE objects and drawings however do support the AccessibleComponent service. They are children of the root object, regardless whether they in fact are bound to a page, a paragraph, etc.
With the exception of images, OLE objects and drawings, area objects have children that are either paragraph fragment objects or table fragment object.
A paragraph fragment object supports the AccessibleEditableText and AccessibleComponent services1. It either represents the text of a paragraph or the text of a part of a paragraph, if the paragraph contains page or column breaks. In the later case it represents exactly the paragraph's text that is displayed at certain page or within a certain column. In other words, a single paragraph might be represented by more than one paragraph fragment object if and only if contains page or column breaks. But a paragraph fragment object never contains text from more than one paragraphs.
A table fragment object supports the AccessibleTable service. It represents the fragment of a table that is displayed on one page or in one column. The table cell objects themselves have paragraph fragment objects as children again.
All area objects contain children only for paragraphs and tables (or fragments of them) that are at least partially visible. If a page header for instance contains two paragraphs, but only one of them is visible, then the page header's area object has one child only2.
Paragraph as well as table fragment objects that are partially visible contain their off-screen parts, too. This means that a paragraph fragment object contains text that is not displayed currently and a table fragment object contains cells that are not displayed currently.
2. Design Influences
This section describes some concepts and issues that influenced the accessible object context tree described in the previous section.
As said above, the text that is shown in the view of a Writer document might in fact be contained in different unrelated parts of a document. Within this proposal, these parts are called text flows. The following text flows exist:
foot- and endnotes
text contained in drawings
body text i.e. the text that is distributed to the page's body regions.
On the screen, the different text flows can be distinguished by gaps or lines between them, or by other hints like background colors. For the accessibility API a simple way to distinguish them should exist, too. This for instance enables voice tools to read the body text of a document without mixing it up with headers, footers, footnotes and so on.
An appropriate way to get a differentiation between text flows seems to be the parent/child relation the XAccessibleContext interface offers. This requires that
a single object that supports the AccessibleEditableText service must not represent text from different text flows.
A single object that supports the AccessibleContext service does not have children that represent text from different text flow.
The area objects exactly make use of these parent/child relation to differ between text flows. To get a unique access to the text flow's text is seems to be reasonable to use area objects even if if they contain one child only.
Pages And Columns
If a document is not displayed in the Online Layout mode, then its body text flow is distributed to several pages. On the screen, the different pages are visualized by a gray bars between them.
There seems to be no requirement for having an accessible object for pages itself. But in many cases, it also seems not to be convenient that text that is accessible by a single AccessibleEditableText service contains a page break, that is, represents text from two different pages. This is the case if the page before the page break contains a footer and/or footnotes, or if the page behind the page break contains a header. If the AccessibleEditableText for the body text would represent the text of both pages, then its bounding box would overlap with the ones of the headers, footers or footnotes.
Moreover it does not seem to be convenient that an AccessibleEditableText represents text of different columns. The simple reason for this is that an AccessibleEditableText that contains text of more than one column would require more than one bounding box.
Like for pages, there also seems to be no requirement for having an accessible object for columns themselves.
The third thing that has to taken into account when defining the accessible object tree are paragraphs. On the one hand, paragraphs divide text into fragments at reasonable positions. On the other hand, and that's more important, they assign certain semantics to text, like being a heading or an item of a bullet list. These semantics put a structure at the document that is at least helpful for navigation in a document. Therefor it seems to be reasonable to not have text of more than one paragraph within one accessible object, and to include the structural information a paragraph carries into either the object's description or role.
The opposite however does not hold. A paragraph's text has to be distributed to more than one object if the paragraph contains a page or column break. Therefor there will be accessible objects for paragraph portions in fact, the paragraph fragment objects.
Though the guidelines state that text can be splitted into on- and off-screen parts, it seems not to be convenient to do that for paragraph portions. This means that text that is not visible on the scroll might be accessible through an AccessibleEditableText service, provided that also some text of the same service is displayed. There are two reasons for that, that both have to do with the fact the each paragraph fragment object also supports the AccessibleComponent service that gives access to its screen coordinates. First of all, that's the same behavior has for any other accessible object that supports the AccessibleCpmponet service. Secondly, and more important, hiding the non visible text fragments would have the result that a simple scroll action might change the objects text.
To be continued ...
1Editor's note: There is a open issue with read-only and generated text within paragraphs, like fields and generated hyphens. Therefor it might be necessary that the paragraph fragment objects do not support AccessibleEditableText themselves, but have children that either support AccessibleText or AcessibleEditableText.
2That's in fact the reason these objects are called header area objects instead of header objects. They do not represent a header, but an area on the screen there a header is displayed.