Guide to HTML5 Hiccups

HTML5 is the next generation of web markup, and the first web markup language created in the era of web applications. While we view many aspects of it quite favorably, we believe the specification could benefit from a few changes and clarifications, outlined below.

Validation of XHTML and HTML

The spec should clarify that an author can use XHTML or HTML syntax, that it is a coding style preference. It would be great if Henri could add a toggle to the validator that will check for syntax. Something along the lines of “Also check for XHTML syntax validity.”

While more specific toggles (e.g. “Check for quoted attributes”) may benefit some power users, the largest benefit would be realized by providing at least a high-level “XHTML syntax” toggle, which could (perhaps with progressive disclosure help text) document what it is checking in addition, like:

quoted attributes
self-closed empty elements (e.g. <meta />, <link />, <img />, <br /> etc.)
explicit closing tags (for non-empty elements)
named entities defined by XHTML 1.0
... etc.

We want the ability to check for stricter syntax “at validation time” to be conducive to more efficient authoring/development. Every good programmer knows bugs are much more quickly caught and fixed at compile-time than at run-time. Validation is the compile-time of HTML.

All of the above syntax checking (and more) is currently provided by the W3C Validator when validating XHTML 1.0 DOCTYPE documents served as text/html. We want the option for the same (or better?) level of XHTML syntax checking when validating HTML5 documents as text/html.

We don’t think that there needs be a toggle in the language itself to indicate preferred syntax.

New elements and attributes

Heading scope

We are excited about the the ability in HTML5 to scope headings via the section element. This proposes a significant improvement in fluidity of content reuse and eases the burden of creating mashups.

Adding New Elements

We would like to encourage spec authors to be conservative in including new tags, and only do so when they addition of the tag allows for significant gains in functionality. For example, article and section are identical except that article allows a pubdate attribute. We would suggest that article be dropped and section be adapted to allow an optional pubdate attribute or, even better, more explicit metadata.

Overall, we feel confident that the semantics of many of the new elements proposed by the HTML5 spec are currently in widespread use by content authors as semantic class names in currently published HTML4 and XHMTL1 documents. However, we did note that occasionally the usage as suggested by the spec varies from the way web developers currently understand/use the element. Content authors, confident that they understand the meaning of a tag which matches a class name they have been using for years, will likely use the element incorrectly, causing disparity between the element as specified and used in practice. There is a worry that the spec becomes effective fiction if it doesn’t document current practice accurately (when current practice is advisable).

We are concerned that the footer element as defined is more restrictive than the header element. It violates the “Principle of least surprise”—people will see the footer element and “know what it’s about” use it for the footer of the page not footers of sections.

Content authors have followed the taxonomy of an html page to mark up templates, including class names or ids; #header, #body, and #footer. For example, “footer” is commonly used to denote a portion of the template at the bottom of the page. Footers may well contain copyright or other meta information, but generally it is understood to simply be content which is secondary to that which is within the #body. It may contain tertiary navigation, sections, or even headings.

Review of Data

We propose two reviews of the test data used to suggest the addition of footer and header, to better determine how content authors are currently using “footer.”

A distribution of how deeply nested nodes are that contain the footer class/ID. Our hypothesis is that footer is much more frequently a very near child of the body element, which would indicate that content authors consider the footer to be a template-level element rather than a meta-data container for deeply nested sections.
On the pages tested, how frequently elements with the class/id footer contain block level elements and headings, thus effectively being inconsistent with the current specification. Our hypothesis is that current trending seems to be leaning toward the footer being considered a content area, just slightly less important than the main body content of a section or page.

Proposed Solution

Assuming our hypotheses are deemed accurate, we propose the following. The footer is an important element, but either its content model needs to be changed to match that of header, so that it may be used concurrently as a template level element and (the new use) a sub-division of section, or the name needs to be changed so that it will not seem related to how footer is currently used by content authors.

To allow for the broader use case, the content model of the footer should be:

Flow content, but with no header or footer element descendants.

The textual description of footer should be updated from:

The footer element represents a footer for its nearest ancestor sectioning content. A footer typically contains information about its section such as who wrote it, links to related documents, copyright data, an[d] the like

...to...

The footer element represents a footer for its nearest ancestor sectioning content. A footer typically contains information about its section such as who wrote it, links to related documents, tertiary navigation, copyright data, an the like, but may contain any information which is secondary to the main content in its nearest ancestor sectioning content. If that is the body than the footer is secondary content to the document as a whole.

hgroup

We don’t see the added value of this element and would instead add a boolean attribute to the heading element which allows content authors to specify if that particular heading should be included in the outline.

We would like the specification authors to clarify the note:

The point of hgroup is to mask an h2…

Is the note specific to the example? It looks like it may apply to the hgroup element in general.

Aside

The use cases for aside are too limited to warrant its inclusion in the specification. We were also concerned about potentially duplicating content within an aside.

Canvas

It is exciting to see Canvas stretch the boundaries of what is possible in web applications and websites. We want universal access to the canvas element defined in the specification and implemented so we can effectively use it without damaging the accessibility of our projects. We are happy to see the Canvas Accessibility API task force making quick progress, even while an accessibility solution is still being sought. Within the specification, please include universal access examples, explaining how you anticipate content authors using this element to create both simpler content (interactive charts and graphics, etc), as well as full fledged applications like Bespin.

The current implementations of the canvas element are similar to the state of accessibility of Flash in 1998; with the impressive skills of developers today, we can do better than that! Providing universal access to the canvas element is certainly an engineering challenge, we look forward to it being a reality.

ARIA

While ARIA was not included in the version of the specification we reviewed, recent drafts propose how to integrate ARIA. We like where this is headed and believe this issue should be resolved before Last Call to ensure adoption, implementation, and integration of ARIA into HTML5.

Dialog

A new element <dialog> does not provide added value over current markup. In addition, <dt> and <dd> seem like a stretch for this particular application.

Time

There are a few use cases which are not possible with an implementation of the time element as currently specified. For example, many choose to indicate their birth month and day in a social networking site while being unwilling to divulge the year, or only divulge the year and not the month and day. The birthday examples research documents specific sites and URLs.

An application that extracted friends birthdays to create a graph of events in the upcoming year should be able to determine the day and month, even in the absence of the exact year (especially for recurring events).

The simple fix: allow the time element’s datetime attribute to accept standard year values and month-day values, both of which are allowed by ISO8601: YYYY and --MM-DD respectively.

We are also imagining an application that might input Wikipedia data and output an annotated visual timeline. For movements or trends rather than events, it would need to output rough dates and date ranges like 2001-2003, rather than exact dates. This may require extending the time element, or perhaps some compound markup that uses one or more time elements.

Details

We suggest that legend be replaced by a new element because current UA implementations require legend be part of a form or fieldset which would make the element effectively unusable for the foreseeable future.

We agree that creating new elements is undesirable but in this case, it appears to be necessary for pragmatic reasons.

Figure

We suggest that, by default, UAs render figure width no wider than the video, image, or canvas that it contains. UAs should calculate this width and flow the legend/label content within. We also suggest that legend be replaced by label because current UA implementations require legend be part of a form or fieldset which would make the element effectively unusable for the foreseeable future.

Progress and Meter

The only obvious difference in these two elements is that meter contains a start value. Are they perhaps different visual displays of the same information?

A (type icon)

Please add a use-case which demonstrates how alt text would be included.

Changed elements and attributes

Style

We propose removing the words “if applied” from section 4.2.6 to avoid disparate behaviors across different UAs.

Otherwise, the specified styles must, if applied, be applied to the entire document.

We would also encourage content authors to sandbox code from unknown sources in an iframe if there are concerns about security, code integrity, or style collisions.

Placeholder links (a element without href attribute)

Are placeholder links active elements? If yes, we are concerned that someone could tab to a placeholder link but not be able to activate it. Please clarify expected user agent behavior.

Small element

We support Remy’s proposal. We had some discussion of it being a block level element versus inline. What DOM does it create? From some of our initial tests, when small is used as a block level element, some browsers break the small element into a series of inline elements. Backwards compatibility will be essential to making this element usable for developers.

Alternatively, if the small element cannot be changed to match its new updated meaning, it should, like its companion, the big element, be dropped from the spec.

Cite

Review of Data

We propose a test to determine current usage of the cite element. Are document authors using it to markup only titles of books and other works, or do they also use it to refer to spoken work by people, e.g.

<cite>@t</cite> <q>Plato used shadows of sock puppets.</q>

div

We are concerned that the div element seems to be a second class citizen as implied by this note:

Note: Authors are strongly encouraged to view the div element as an element of last resort, for when no other element is suitable. Use of the div element instead of more appropriate elements leads to poor accessibility for readers and poor maintainability for authors.

Limiting the number and complexity of new semantic elements means that div plus a class name or ARIA role will still be an appropriate way to markup content in many contexts.

Absent elements and attributes

Most of the elements removed seem appropriate, even if their loss makes some us feel rather nostalgic. We would like the specification authors to clarify how to markup multi-dimensional tables with an example. There is some concern about the abbr attribute on th elements going away. We are glad to see that the headers and scope attributes remain.

The HTML5 Super Friends