网站公告列表

  没有公告

加入收藏
设为首页
联系站长
您现在的位置: 网络学院 >> 程序设计 >> Java编程 >> 文章正文
  XML解析开发指南            【字体:
XML解析开发指南
作者:佚名    文章来源:不详    点击数:    更新时间:2007-9-2    

XML解析开发指南

1       修改历史

版本

修改历史

 作者

描述

开发时间(h)

0.1

2007-8-14

LevinSoft

创建文档得基本结构、基本流程

5

2       介绍

本文是基于XML开发相关内容。
正在装载数据……
主要包括:XML基本概念、XML中间件比较、XPATH介绍、开发实例、参考资源等。

我相信,掌握学习方法比没有秩序的学习要重要的多。因此,在文章每一部分,对于重要的地方、或者是比较难理解的地方,结合自己的实际开发,提出了一些心得。包括:学习的内容、进步的过程等。给出了翔实的例子,并添加注释。

3       基本概念

XML (Extensible Markup Language) is a flexible way to create common information formats and share both the format and the data on the World Wide Web, intranets, and elsewhere. XML can be used by any individual or group of individuals or companies that wants to share information in a consistent way. XML, a formal recommendation from the World Wide Web Consortium (W3C), is similar to the language of today's Web pages, the Hypertext Markup Language (HTML). Both XML and HTML contain markup symbols to describe the contents of a page or file. HTML, however, describes the content of a Web page (mainly text and graphic images) only in terms of how it is to be displayed and interacted with. For example, a <P> starts a new paragraph. XML describes the content in terms of what data is being described. For example, a <PHONENUM> could indicate that the data that followed it was a phone number. This means that an XML file can be processed purely as data by a program or it can be stored with similar data on another computer or, like an HTML file, that it can be displayed. For example, depending on how the application in the receiving computer wanted to handle the phone number, it could be stored, displayed, or dialed

3.1    XML

Ø        eXtensible Markup Language

Ø        XML is a meta-language, you create tags by yourself.

Ø        XML was derived from Standard Generalized Markup Language, SGML(Standard Generalized Markup Language).

Ø        XML is only a markup language , but use it we can write applications in such aspects: web sites, electronic data interchange, vector graphics, genealogy, real-estate listings, object serialization, remote procedure calls, voice-mail systems, and more.

Ø        XML is a creation of the World Wide Web Consortium (W3C) http://www.w3.org

3.2    Why XML

Ø        Data and divsentation can be separated.

Ø        So the developer will put more concerns on data, but not how to display them.

Ø        Exchanging Data is more flexible.

n         XML can be stored as text, it is independent of platforms.

Ø        Used widely in the B2B business model

Ø        Many databases support XML , you can import or export data to or from the databases.

Ø        XML can be used to Create new Languages, such as WML(Wireless Markup Language).

3.3    What is XML?

Ø        Simple text formatted to follow a well-defined set of rules

Ø        XML documents consist primarily of tags and text, familiar with HTML document.

Ø        The tags define keys that have text values and may contain data on any topic, that is, name-value pairs.Example: City = Beijing, County=China

Ø        This text may be stored/redivsented in:

n         A normal file stored on disk.

n         A message being sent over HTTP.

n         A character string in a programming language.

n         A text BLOB (binary large object) in a database.

n         Any other way textual data can be used.

3.4    XML Family Overview

3.5    XSL

XSL is a family of recommendations for defining XML document transformation and divsentation. It consists of three parts:

XSL Transformations (XSLT)

a language for transforming XML

the XML Path Language (XPath)

an exdivssion language used by XSLT to access or refer to parts of an XML document. (XPath is also used by the XML Linking specification)

XSL Formatting Objects (XSL-FO)

an XML vocabulary for specifying formatting semantics

An XSLT stylesheet specifies the divsentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary, such as (X)HTML or XSL-FO. For a more detailed explanation of how XSL works, see the What Is XSL page.

For background information on style sheets, see the Web style sheets resource page. XSL is developed by the W3C XSL Working Group (members only) whose charter is to develop the next version of XSL. XSL is part of W3C's XML Activity, whose work is described in the XML Activity Statement

3.5.1            XSLT

The XSL Transformations (XSLT) describes a language for transforming XML documents into other XML documents or other text output. It was defined by the W3C XSL Working group.

The XSLT 1.0 Recommendation is located at:

http://www.w3.org/TR/1999/REC-xslt-19991116

4       XPATH

4.1    介绍

4.1.1            Abstract

XPath is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer.

XPath 2.0 is an exdivssion language that allows the processing of values conforming to the data model defined in [XQuery/XPath Data Model (XDM)]. The data model provides a tree redivsentation of XML documents as well as atomic values such as integers, strings, and booleans, and sequences that may contain both references to nodes in an XML document and atomic values. The result of an XPath exdivssion may be a selection of nodes from the input documents, or an atomic value, or more generally, any sequence allowed by the data model. The name of the language derives from its most distinctive feature, the path exdivssion, which provides a means of hierarchic addressing of the nodes in an XML tree. XPath 2.0 is a superset of [XPath 1.0], with the added capability to support a richer set of data types, and to take advantage of the type information that becomes available when documents are validated using XML Schema. A backwards compatibility mode is provided to ensure that nearly all XPath 1.0 exdivssions continue to deliver the same result with XPath 2.0; exceptions to this policy are noted in [I Backwards Compatibility with XPath 1.0].

4.1.2            Abbreviated Syntax

Here are some examples of location paths using abbreviated syntax:

·         para selects the para element children of the context node

·         * selects all element children of the context node

·         text() selects all text node children of the context node

·         @name selects the name attribute of the context node

·         @* selects all the attributes of the context node

·         para[1] selects the first para child of the context node

·         para[last()] selects the last para child of the context node

·         */para selects all para grandchildren of the context node

·         /doc/chapter[5]/section[2] selects the second section of the fifth chapter of the doc

·         chapter//para selects the para element descendants of the chapter element children of the context node

·         //para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node

·         //olist/item selects all the item elements in the same document as the context node that have an olist parent

·         . selects the context node

·         .//para selects the para element descendants of the context node

·         .. selects the parent of the context node

·         ../@lang selects the lang attribute of the parent of the context node

·         para[@type="warning"] selects all para children of the context node that have a type attribute with value warning

·         para[@type="warning"][5] selects the fifth para child of the context node that has a type attribute with value warning

·         para[5][@type="warning"] selects the fifth para child of the context node if that child has a type attribute with value warning

·         chapter[title="Introduction"] selects the chapter children of the context node that have one or more title children with string-value equal to Introduction

·         chapter[title] selects the chapter children of the context node that have one or more title children

·         employee[@secretary and @assistant] selects all the employee children of the context node that have both a secretary attribute and an assistant attribute

The most important abbreviation is that child:: can be omitted from a location step. In effect, child is the default axis. For example, a location path div/para is short for child::div/child::para.

There is also an abbreviation for attributes: attribute:: can be abbreviated to @. For example, a location path para[@type="warning"] is short for child::para[attribute::type="warning"] and so selects para children with a type attribute with value equal to warning.

// is short for /descendant-or-self::node()/. For example, //para is short for /descendant-or-self::node()/child::para and so will select any para element in the document (even a para element that is a document element will be selected by //para since the document element node is a child of the root node); div//para is short for div/descendant-or-self::node()/child::para and so will select all para descendants of div children.

NOTE: The location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.

A location step of . is short for self::node(). This is particularly useful in conjunction with //. For example, the location path .//para is short for

self::node()/descendant-or-self::node()/child::para

and so will select all para descendant elements of the context node.

Similarly, a location step of .. is short for parent::node(). For example, ../title is short for parent::node()/child::title and so will select the title children of the parent of the context node.

4.1.3            Abbreviations

注意:如果级联多层(多于两层),也要把参数设置为://

[10]   

AbbreviatedAbsoluteLocationPath

   ::=   

'//' RelativeLocationPath

 

[11]   

AbbreviatedRelativeLocationPath

   ::=   

RelativeLocationPath '//' Step

 

[12]   

AbbreviatedStep

   ::=   

'.'

 

 

 

 

| '..'

 

[13]   

AbbreviatedAxisSpecifier

   ::=   

'@'?

 

 

4.2    使用心得

1. 这些缩写经常组合起来使用。

2. 灵活的使用常用这些缩写,可以大大提高开发效率。

3. 没有必要记住所有的缩写,需要时,可以查询一下手册。

4. 但是要记住最常用的使用。每次查询手册也是非常耗时间的。提高你的开发速度和成就感。

4.3    XPath操作实例

这里采用dom4j工具对xpath语法的使用进行举例说明 。

4.3.1             para[@type="warning"]

para[@type="warning"] selects all para children of the context node that have a type attribute with value warning

 

方法一:

Node propertyNode = userDoc.selectSingleNode("/persons/personList/property[@name='"

              + name + "']/@value");

       if(propertyNode == null)

           return “”;

       return propertyNode.getText()

方法二:

             Node node = configFile.selectSingleNode("/para-config/paraMapping[@para='"

              + paraValue + "']");

       Node mappedIdNode = node.selectSingleNode("@mappedId");    

4.3.2             取得节点列表

List personList = docInfo.selectNodes("/persons/personList/privilage");  

4.3.3             //

//para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node

 

List<Element> nodeList = orderRelationNode.selectNodes("//person");

4.3.4            /web-app/servlet[1]/servlet-name

其中[1]表示,第一个node

private static String DEMO_XML =

        "<?xml version='1.0' encoding='ISO-8859-1'?>\n" +

        "<web-app>\n" +

        "<servlet>\n" +

        "<servlet-name>snoop</servlet-name>\n"+

        "<servlet-class>SnoopServlet</servlet-class>\n"+

        "</servlet>\n" +

        "</web-app>";

 

demoDocument = DocumentHelper.parseText( DEMO_XML );

demoDocument.valueOf( "/web-app/servlet[1]/servlet-name" )

 

另外一种方式:

public String getSingleNodeValue(String nodeName) {

        String xPathExdivssion = "/Package/" + nodeName;

        return getDocument().valueOf(xPathExdivssion);

}

4.3.5             XML属性值设置

当给一个节点中的一个属性赋值时,如果赋值为null,那么在dom4j解析的底层,处理的方式,不给这个付任何值。也就是说,在客户端,它收不到这个属性的显示。

例如:XML格式

<Response status="0" message="查询个人订购关系成功">

<param name="lines" value="2"/>

<person userId="1002" userName="P1001"> </person>

</Response>

当如果,当在逻辑处理时,如果设置一个属性为null, 比如:userName设置为null,那么在客户端。

<param name="lines" value="1"/>

<param name="personList">

<person userId="05797220830"/>

</param>

4.4    更多好的实例

http://www.zvon.org/xxl/XPathTutorial/

该网站,还提供大量的xml资源。

·         XLab - interactive XPath experiments

·         XML tutorial

·         DTD tutorial

·         XML Namespace tutorial

·         XHTML 1.0 reference

·         XHTML Basic reference

下面摘取了几个实例。

4.4.1            Example1

Values of attributes can be used as selection criteria. Function normalize-space removes leading and trailing spaces and replaces sequences of whitespace characters by a single space

//BBB[@name='bbb']