summaryrefslogtreecommitdiff
path: root/doc/lispref
diff options
context:
space:
mode:
authorLars Magne Ingebrigtsen <larsi@gnus.org>2010-09-10 18:44:35 +0200
committerLars Magne Ingebrigtsen <larsi@gnus.org>2010-09-10 18:44:35 +0200
commit381408e2192b8fd606babaa8c9a103186589d708 (patch)
tree488a49b786d5cffcd0b068a527ec1ebe8339114a /doc/lispref
parent36f7d3666905e1447a2e80957735a1ade23c894c (diff)
Add support for the libxml2 library.
This adds the html-parse-string and xml-parse-string functions in the new file src/xml.c, as well as autoconf detection of the library.
Diffstat (limited to 'doc/lispref')
-rw-r--r--doc/lispref/text.texi44
1 files changed, 44 insertions, 0 deletions
diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi
index 142a071f494..ff4e65d299f 100644
--- a/doc/lispref/text.texi
+++ b/doc/lispref/text.texi
@@ -59,6 +59,7 @@ the character after point.
position stored in a register.
* Base 64:: Conversion to or from base 64 encoding.
* MD5 Checksum:: Compute the MD5 "message digest"/"checksum".
+* Parsing HTML:: Parsing HTML and XML.
* Atomic Changes:: Installing several buffer changes "atomically".
* Change Hooks:: Supplying functions to be run when text is changed.
@end menu
@@ -4106,6 +4107,49 @@ using the specified or chosen coding system. However, if
coding instead.
@end defun
+@node Parsing HTML
+@section Parsing HTML
+@cindex parsing html
+@cindex parsing xml
+
+Emacs provides an interface to the @code{libxml2} library via two
+functions: @code{html-parse-buffer} and @code{xml-parse-buffer}. The
+HTML function will parse ``real world'' HTML and try to return a
+sensible parse tree, while the XML function is somewhat stricter about
+syntax.
+
+They both take a two optional parameter. The first is a buffer, and
+the second is a base URL to be used to expand relative URLs in the
+document, if any.
+
+Here's an example demonstrating the structure of the parsed data you
+get out. Given this HTML document:
+
+@example
+<html><hEad></head><body width=101><div class=thing>Foo<div>Yes
+@end example
+
+You get this parse tree:
+
+@example
+(html
+ (head)
+ (body
+ (:width . "101")
+ (div
+ (:class . "thing")
+ (text . "Foo")
+ (div
+ (text . "Yes\n")))))
+@end example
+
+It's a simple tree structure, where the @code{car} for each node is
+the name of the node, and the @code{cdr} is the value, or the list of
+values.
+
+Attributes are coded the same way as child nodes, but with @samp{:} as
+the first character.
+
@node Atomic Changes
@section Atomic Change Groups
@cindex atomic changes