diff options
author | Lars Magne Ingebrigtsen <larsi@gnus.org> | 2010-09-10 18:44:35 +0200 |
---|---|---|
committer | Lars Magne Ingebrigtsen <larsi@gnus.org> | 2010-09-10 18:44:35 +0200 |
commit | 381408e2192b8fd606babaa8c9a103186589d708 (patch) | |
tree | 488a49b786d5cffcd0b068a527ec1ebe8339114a /doc/lispref | |
parent | 36f7d3666905e1447a2e80957735a1ade23c894c (diff) |
Add support for the libxml2 library.
This adds the html-parse-string and xml-parse-string functions in the
new file src/xml.c, as well as autoconf detection of the library.
Diffstat (limited to 'doc/lispref')
-rw-r--r-- | doc/lispref/text.texi | 44 |
1 files changed, 44 insertions, 0 deletions
diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi index 142a071f494..ff4e65d299f 100644 --- a/doc/lispref/text.texi +++ b/doc/lispref/text.texi @@ -59,6 +59,7 @@ the character after point. position stored in a register. * Base 64:: Conversion to or from base 64 encoding. * MD5 Checksum:: Compute the MD5 "message digest"/"checksum". +* Parsing HTML:: Parsing HTML and XML. * Atomic Changes:: Installing several buffer changes "atomically". * Change Hooks:: Supplying functions to be run when text is changed. @end menu @@ -4106,6 +4107,49 @@ using the specified or chosen coding system. However, if coding instead. @end defun +@node Parsing HTML +@section Parsing HTML +@cindex parsing html +@cindex parsing xml + +Emacs provides an interface to the @code{libxml2} library via two +functions: @code{html-parse-buffer} and @code{xml-parse-buffer}. The +HTML function will parse ``real world'' HTML and try to return a +sensible parse tree, while the XML function is somewhat stricter about +syntax. + +They both take a two optional parameter. The first is a buffer, and +the second is a base URL to be used to expand relative URLs in the +document, if any. + +Here's an example demonstrating the structure of the parsed data you +get out. Given this HTML document: + +@example +<html><hEad></head><body width=101><div class=thing>Foo<div>Yes +@end example + +You get this parse tree: + +@example +(html + (head) + (body + (:width . "101") + (div + (:class . "thing") + (text . "Foo") + (div + (text . "Yes\n"))))) +@end example + +It's a simple tree structure, where the @code{car} for each node is +the name of the node, and the @code{cdr} is the value, or the list of +values. + +Attributes are coded the same way as child nodes, but with @samp{:} as +the first character. + @node Atomic Changes @section Atomic Change Groups @cindex atomic changes |