diff options
Diffstat (limited to 'libraries/html5lib/README')
-rw-r--r-- | libraries/html5lib/README | 15 |
1 files changed, 4 insertions, 11 deletions
diff --git a/libraries/html5lib/README b/libraries/html5lib/README index e97c619b21..7e57438059 100644 --- a/libraries/html5lib/README +++ b/libraries/html5lib/README @@ -1,12 +1,5 @@ -html5lib (HTML parser based on the HTML5 specification) +html5lib is a pure-python library for parsing HTML. It is designed to +conform to the WHATWG HTML specification, as is implemented by all +major web browsers. -HTML parser designed to follow the HTML5 specification. The parser is -designed to handle all flavours of HTML and parses invalid documents -using well-defined error handling rules compatible with the behaviour of -major desktop web browsers. - -Output is to a tree structure; the current release supports output -to DOM, ElementTree and lxml tree formats as well as a simple -custom format. - -Optional: datrie, python-chardet, lxml and genshi +Optional dependencies: genshi and lxml |