fixing web pages on the fly

I was thinking about the fact that browsers have to fix up a page internally in order to render it correctly if elements are incorrectly nested or are left unclosed. Since we know that Mozilla does a pretty good job of this, and its fix-up engine is open source, shouldn't it be possible to make a web proxy that feeds a tag soup source HTML document through the fixer-upper and outputs a valid XHTML document?

I'd love to see a web service that I could send a URL to that would pipe the page through Mozilla and then through HTML Tidy and (eventually, presuming this would be a slow process) spit out valid XHTML on the other side.

My secret ulterior motive for all of this is that I want to see people start to work on conferring benefits to sites that use proper XHTML, by offering richer indexing, search, transformation or presentation opportunities. And perhaps the best way to demonstrate these new opportunities on existing, invalid pages would be if we had a way to easily create machine-made valid versions. Granted, the transformations would be imperfect, but they might be close enough to show the potential applications.

I'm Anil Dash, and I've been blogging here since 1999, writing about how culture is made. Contact me at anil@dashes.com, at +1 646 833 8659, or at anildash on Twitter or IM. Find out more »

If you're new to the site, check out my Best Of and Most Popular things I've written in the past 10 years, or explore the full archives. Browse by month or year using the calendar below.

Powered by Hunch.com

1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
  Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan
  Feb Feb Feb Feb Feb Feb Feb Feb Feb Feb
  Mar Mar Mar Mar Mar Mar Mar Mar Mar Mar
  Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr
  May May May May May May May May May May
  Jun Jun Jun Jun Jun Jun Jun Jun Jun Jun
Jul Jul Jul Jul Jul Jul Jul Jul Jul Jul Jul
Aug Aug Aug Aug Aug Aug Aug Aug Aug Aug Aug
Sep Sep Sep Sep Sep Sep Sep Sep Sep Sep Sep
Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct
Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov  
Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec