Web Standards Test: Top 100 Sites

While working on the third edition of Designing With Web Standards, I decided to visit Alexa’s Top 100 US Sites to see how many of the top 100 use valid markup, how many nearly validate (i.e. would validate if not for an error or two), and which DOCTYPEs predominate. Even with a fistful of porn sites in the mix, it was dull work: click a link, load the home page, run a validation bookmarklet, record the result.

I had no expectations. I made no assumptions. I just clicked and tested.

Such tests tell us little

I make no claims about what I found. If all the home pages of the top 100 sites were valid, it would not mean that the pages beneath the home page level were valid, nor would it prove that the sites were authored semantically. (An HTML 4.0 table layout with no semantics can validate; so can a site composed entirely of non-semantic divs with presentational labels.)

Validation is not the be-all of standards-based design; it merely indicates that the markup, whatever its semantic quality may be, complies with the requirements of a particular standard. Conversely, lack of validation does not prove lack of interest in web standards: ads and other third-party content can wreck a once-valid template, as can later third-party development work.

Moreover, nothing causal or predictive can be determined from these results. If 25% of the top 100 sites validated in my test, it would not mean that 25% of all sites on the web validate.

And I got nothing like 25%.

Enough disclaimers. On with the test.

Seven percent validate

On this day, in this test, seven out of 100 “top US” sites validated:

  1. MSN (#7 in Alexa’s list) validates as XHTML 1.0 Strict. Who’d a thunk it? (Validation link)
  2. Craigslist (#10) validates as HTML 4.01 Transitional. I’ll buy that! (Validation link)
  3. WordPress (#22) validates as XHTML 1.0 Transitional. The power of the press, baby! (Validation link)
  4. Time Warner RoadRunner (#39) validates as XHTML 1.0 Transitional. Meep-Meep! (Validation link)
  5. BBC Newsline Ticker (#50) validates as XHTML 1.0 Strict. Cheers, mates! (Validation link)
  6. The US Internal Revenue Service (#58) validates as HTML 4.01 Transitional. Our tax dollars at work! (Validation link)
  7. TinyPic (#73) (“Free Image Hosting”), coded by ZURB, validates as XHTML 1.0 Transitional. (Validation link)

Also-rans (one or two errors)

  1. Wikipedia (#8) almost validates as XHTML 1.0 Strict (two errors).
  2. Apple (#29) almost validates as HTML 4.01 Transitional (two errors).
  3. Linkedin.com (#45) almost validates as HTML 4.01 Transitional (one error).
  4. AWeber Communications (#83) almost validates as XHTML 1.0 Transitional (one error: an onClick element)

Suis generis

The Pirate Bay (#68), “the world’s largest BitTorrent tracker,” goes in and out of validation. When it validates, it’s a beautiful thing, and it belongs on the list. But when it goes out of validation, it can quickly stack up ten errors or more. (Validation Link)


Google (#1) does not validate or declare a DOCTYPE.

Yahoo (#2) does not validate or declare a DOCTYPE.

YouTube (#3) does not validate but at least declares that it is HTML 4.01 Transitional. Progress!

A surprising number of sites that do not come close to validating declare a DOCTYPE of XHTML 1.0 Strict. For instance, Twitter (#93) is authored in XHTML 1.0 Strict, although it contains seven errors.

This preference for Strict among non-validating sites suggests that at one point these sites were made over by standards-aware developers; but that any standards improvements made to these sites were lost by subsequent developers. (It doesn’t prove this; it merely suggests.) Another possibility is that some developers use tools that are more standards-aware than they are. (For instance, a developer with little to no knowledge of web standards might use a tool that defaults to the XHTML 1.0 Strict DOCTYPE.)

Some sites that used to validate (such as Blogger.com, previously designed by Douglas Bowman, and Reference.com, previously designed by Happy Cog) no longer do so; maintaining standards or design compliance may not have been important to new owners or new directors.

[tags]validation, webstandards, alexa, test[/tags]