HTML5 ARTICLE

August 19, 2013

Google’s Gumbo HTML5 Parsing Library Goes Open Source


Google (News Alert) has long been a supporter of HTML5 and is also known to be a proponent of open source. With that in mind, it’s not too surprising that the company recently open sourced its HTML parsing library, Gumbo. Written in C, Gumbo adheres to the HTML5 parsing algorithm, allowing it to pass all html5lib-0.95 tests.

Google’s long-running support of the latest HTML revision goes back a long way, with one of the biggest steps being the decision to support only browsers capable of supporting HTML5 back in 2011. The goal with this was to help Web applications develop quickly to a point where they could compete with traditional software — a goal that has been more or less realized.

More recently, Google — along with Microsoft and Netflix — began efforts to get proper DRM (digital rights management) incorporated within HTML5. While DRM seems contrary to Google’s tendency toward openness, it is necessary for media companies to use the Web standard.

As for Gumbo, it has been tested on 2.5 billion pages indexed by Google at the time of its open sourcing, providing developers with a lightweight and dependable HTML parsing library with no outside dependencies that can be called from most languages. Some examples of where Gumbo could be useful include webpage validators, static analyzers, templating languages and refactoring tools, to name a few.

While Google has described Gumbo as being “robust and resilient to bad input,” the company still doesn’t recommend maintaining pointers to parts of its internal data structures, as the API is still being worked on and as such will likely change in the future. Despite its relatively early state, though, the API is considered stable and is only waiting on comments from users before being released as version 1.0, which is likely to happen soon.

Future features to look forward to in Gumbo include support for recent HTML5 spec changes to support the template tag (NewsAlert), support for fragment parsing, full-featured error reporting, and bindings in other languages.




Edited by Alisen Downey





HTML5 RESOURCES

HTML 5 Demos and Examples

HTML 5 experimentation and demos I've hacked together. Click on the browser support icon or the technology tag to filter the demos.... Learn More

HTML5 GAMES

HTML5games.com is the largest and most comprehensive directory of HTML5 games on the internet... Learn More

The HTML5 test

How well does your browser support HTML5?... Learn More

Working Draft (WHATWG)

This is the Editor’s Draft from WHATWG. You can use it online or print the available PDF version... Learn More

HTML5 Flip Book

Free jQuery and HTML5 flip book maker for PDF to online page turning book conversion... Learn More