Navigation

The harsh truth about HTML5’s structural semantics (part 3)

By Luke Stevens | HTML, Web Design, Web Development | Jan 14, 2013

In the first part of this series we looked at the failings that lead to the structural elements new to HTML5; in the second part of the series we looked in detail at the consequences of those failings; in this final part we’ll look for a way forward, and draw some conclusions about the current state of play.

This is not an abstract problem: people actually have to teach this stuff. The next generation of web designers and developers will be taught with HTML5 as a starting point. I feel sorry for whoever has to try and explain outlining and sectioning to the current and future crop of students. Perhaps they’ll wisely stick to the simple format we have that still works fine: use divs with either an ID or class/es.

It’s reasonable to suggest that perhaps user agents in the future, such as browsers and screen readers, will do more with HTML5’s structural elements, and that will make them more palatable to us as authors.

Opera’s Bruce Lawson suggested just that on the WHATWG mailing list in 2009:

After all, I know of no user agents that can use time, section, footer, datagrid etc but we mostly expect there to be soon.

And here’s what Hickson, the HTML5 editor, said in reply:

I don’t. Most of the new elements are just meant to make styling easier, so that we don’t have to use classes.

All that, and the editor doesn’t see user agents ever even using these elements. They are there, he says, to save us from using classes. Put another way, the creator of these elements seems unsure why they’re even in the spec, save ‘making styling easier’.

 

Do we need more semantics in HTML?

There’s a school of thought that says we only need a handful of new semantics anyway. After all, the spec would become bloated if there were tens or hundreds more new terms added.

One the one hand, I agree. In terms of structuring a basic web page, I’d say we’d be better off without HTML’s sectioning elements altogether. What was once a straightforward exercise in using divs has become a complicated mess in HTML5 for no net gain.

However, in terms of semantics in general, there are cases where more meaning can be added on top of the structure of our web page (be it HTML4, 5 or whatever comes next) using additional attributes on our existing elements.

 

ARIA for accessibility

One of the easiest things to implement on an existing or new site are ARIA landmarks. (ARIA stands for Accessible Rich Internet Applications and is a specification that defines a way of making web applications and web pages more accessible.)

ARIA landmarks are a subset of the overall ARIA specification, and a simple type of metadata that allows blind and vision-impaired users with screen readers to jump around to the significant parts of a page, i.e. the ‘landmarks’. We simply add role=”landmark-name” to an existing element, in the same way we would add class=”class-name” to an element. The AIRA landmarks are discussed in comparison to HTML5 here.

ARIA landmarks are a much better match for what we currently do. The naming is a little wonky, but at least they reflect actual web authoring practice. For example, we can use:

  • banner for the overall page header.
  • navigation for navigation.
  • complementary for side bars.
  • contentinfo for the footer.
  • main for the main content area.

(Bear in mind that banner, main and content info should only be used once per-document.)

ARIA landmarks are a simple solution that improves navigation options for screen reader users, without wading into the murky territory of HTML’s document outlining. They’re quick and easy to implement, and really should be a part of your basic HTML project template.

 

Search engines

So we have more semantics for accessibility, but we also have more semantics available for search engines too.

First, let me be absolutely clear that HTML5 elements have no benefit for SEO whatsoever. It’s a myth, and we need to put it to bed. Using an article tag will not help you or your client rank higher than the next guy. You can trust that Google’s figured out how to find and rank your content by now.

However, search engines are keen to understand how best to display (note: not rank) web content in a more structured way.

Therefore, the major search engines have, over the years, put forth or adopted existing standard ways to markup structured data in a web page so they can extract the appropriate information. For example, when searching for reviews you may have noticed star ratings appear in the top Google results. This is a case of search engines being able to extract the review score in a standardized way, and improve the display of their results. Recipes are another example, where cooking time is listed for each result. While such data does not improve a site’s rank, the more detailed result may encourage more users to click through, so there is some potential benefit there for a site, and it is often necessary in an arms race situation where all your competitors are doing it anyway.

While this kind of rich data has been around for a while in various guises, just last year the major search engines launched a vast new array of standards for this sort of structured data. They’re calling them ‘schemas’, and they’re housed at Schema.org. They use HTML’s microdata standard, and have already been implemented by the likes of eBay, IMDB, Rotten Tomatoes and more.

This is the biggest leap towards a more semantic web in years, and yet it was done behind closed doors with little fanfare and no standards process whatsoever. It was dropped on us without warning, and has since mostly flown under the web design community’s radar. There’s a lot of overlap with HTML5 semantics too, for example there’s schemas for articles, web pages and web page elements, among far more schemas that include everything from TV episodes to medical conditions. It’s also the recommended way of describing videos on the web.

After all the debate about HTML’s semantics (and lack thereof), the search engines have made it clear they do want vastly more semantic data in our markup, but its going to happen on top of existing structures, and not with new elements.

But surely for us as a community interested in semantics and web standards, neither HTML’s limited, flawed approach to semantics, or the closed, dropped-out-of-nowhere approach of the major search engines is the best path forward.

In some senses, the HTML5 semantics horse has bolted; it’s just up to us to contain the damage. As for schema.org, it’s a whole new world, and one we should be scrutinizing very closely, or another small group is going to determine what’s in our — and the web’s — best interests for us. In fact, it may have already happened.

 

Conclusions

Let’s wrap up with some conclusions.

  • Headings matter: first, we should really care about the heading structure of our pages to help out blind and vision-impaired users trying to get around with screen readers. The venerable h1-h6 elements may be limited, but they’re relied on heavily by screen reader users.
  • HTML5 structure is a mess: we should probably ignore HTML structural elements altogether. They’ve been a bit of a disaster — we have essentially forked the spec, created plenty of broken outlines, and wasted too much time already trying to get our heads around a fundamentally broken system. Long live divs.
  • Consider ARIA landmarks: adding ARIA landmarks to your site is another simple, effective way of helping screen reader users.
  • Consider schema.org and the future of semantics: schema.org has the backing of the major search engines, and while it’s certainly a mixed bag at the moment, it seems to be future for structured data on the web.

There are lots of good parts in the HTML5 specification, from new forms features to the History API and Canvas implementation. In fact, without the WHATWG, who have been the driving force behind HTML5, we’d still be stuck with HTML4/XHTML 1.0 while we waited for the W3C to get their act together. Nevertheless, just because HTML5 and all the related technology around it is new and exciting, it doesn’t mean we have to accept everything we’re given.

Sometimes it’s worth seeing how the HTML sausage is made, so we know what we’re eating. And in the case of HTML’s structural semantics, I’d rather pass.

 

Hungry for more? Luke’s book “The Truth About HTML5″ is available for a limited time through our sister site MightyDeals.com at an amazing 50% off.

Have you used ARIA landmarks or Schema.org? Do you see a future for HTML5’s structural elements? Let us know in the comments.

Featured image/thumbnail, uses structure image via Shutterstock.

Share this post
Comments (no login required)
  • http://www.blackbookoperations.com/ Black Book Operations

    As with every new/old language (like HTML or HTML5 or whatever) there will always be pro’s and cons. The good thing about technologies concerning the web is: they can always be changed later on with updates/versions/… HTML5, even though already here for quite some time, is still an infant and is slowly turning into an adult language which will shed some babyfat along the way. Keep the discussion going, change never comes easy but is necessary nonetheless.

  • http://www.pixelcrayons.com/ PixelCrayons

    Nice article! I really appreciate your post. Insight information for HTML5 website developers. Thanks for sharing!

    • http://twitter.com/lukestevens lukestevens

      Glad you enjoyed it! :)

  • Steve Fenton

    I’m using Schema.org, but with RDFa Lite, which is an existing W3C standard: http://www.w3.org/TR/rdfa-lite/

    I still think the structural elements are useful. We need to improve the specifications where they are unclear – which any of us can do on the public mailing list. When these structural elements have implicit aria roles, a bunch of sites will suddenly get a work-free upgrade. I think it is fair to say that aria-roles will be missed in many cases where people will use structural elements and if we teach people how to use structural elements better this will benefit a whole bunch of users.

    • http://twitter.com/lukestevens lukestevens

      If only it were that simple! ;) Improving the WHATWG specification requires convincing the Editor, Ian Hickson, to make changes regarding those elements, and over the last ~8 years no one has had much success there. That said, as Steve Faulkner points out in the comments on Part 1 (http://www.webdesignerdepot.com/2013/01/the-harsh-truth-about-html5s-structural-semantics-part-1/#comment-762181909) you can contribute to the W3C’s version of HTML5. But the issue of forked HTML5 specifications is one big can of worms in its own right.

  • http://twitter.com/lukestevens lukestevens

    Thanks for the thoughtful comment. I can understand the desire to opt for a kind of “sectioning-lite” approach, but even simplifying sectioning takes us down a different path, and has the consequence of forking the spec, as discussed in part 1 (http://www.webdesignerdepot.com/2013/01/the-harsh-truth-about-html5s-structural-semantics-part-1/). It’s a tricky problem!

    • Umber

      Hmm, I’m not opposed to forking the spec. One of the most astounding revelations I’ve had recently was when I read this interview with mr. Hickson: http://html5doctor.com/interview-with-ian-hickson-html-editor/ In it he states he opposes non-centrally developed technologies. Which made it clear to me why he is so “dominant” when it comes to spec-related issues. I do not agree with his views on this issue.
      When designers and developers “fork” (a part of) the spec, it may lead (hopefully) to a change in the spec. At least if the fork is compatible, and making a “lite-version” would be.