XML/Introduzione: differenze tra le versioni
Nessun oggetto della modifica |
(Nessuna differenza)
|
Versione delle 12:52, 13 ott 2004
Obiettivi di apprendimento
Alla fine di questo capitolo, sarete in grado di
- definire gli scopidistinti di SGML, HTML e XML
- creare un semplice documento XML usando NetBeans
Introduzione
There are four central problems in data management: capture, storage, retrieval, and exchange. The purpose of this book is to address XML, a technology for managing data exchange. Data exchange has long been an issue, but the Internet has elevated its importance. Electronic data interchange (EDI), the traditional data exchange standard for large organizations, is giving way to XML, which is likely to become the data exchange standard for all organizations, irrespective of size.
EDI supports the electronic exchange of standard business documents. A structured format is used to exchange common business documents (e.g., invoices and shipping orders) between trading partners. In contrast to the free form of e-mail messages, EDI supports the exchange of repetitive, routine business transactions. Standards mean that routine electronic transactions can be concise and precise. The main standard used in the United States and Canada is known as X.12, and the major international standard is EDIFACT. Firms following the same standard can electronically share data.
The Internet is a global network potentially accessible by nearly every firm with communication costs typically less than with traditional EDI. Consequently, the Internet has become the electronic transport path of choice between trading partners. The simplest approach is to use the Internet as a means of transporting EDI documents. Another approach is to reexamine the technology of data exchange, since EDI was developed in the 1960s. A result of this rethinking is XML, but before considering XML, we need to learn about SGML, the parent of XML.
SGML
For a typical U.S. firm, it is estimated that document management consumes up to 15 percent of its revenue, nearly 25 percent of its labor costs, and anywhere between 10 and 60 percent of an office worker’s time. The Standard Generalized Markup Language (SGML) is designed to reduce the cost and increase the efficiency of document management.
A markup language embeds information about a document within the document's text. In the following example, the markup tags indicate that the text contains details of a city. Note also that the city's name, state, and population are identified by specific tags. Thus, the reader, a person or computer, is left in no doubt as to meaning of Athens, Georgia, or 100,000. Note also the latitude and location of the city are explicitly identified with appropriate tags. SGML’s usefulness is based upon both recording text and the meaning of that text.
Table 1: Markup language
<city><cityname>Athens</cityname> is located 60 miles northeast of Atlanta, <statename> Georgia</statename>. Home of the University of Georgia, it has a population of just over <population> 100,000</population>. Athens' location is <latitude>33º 57' 39" N</latitude>, <longitude> 83º 22' 42" W</longitude></city> |
SGML is a vendor-independent International Standard (ISO 8879) that defines the structure of documents. Developed in 1986 as a meta language, SGML is the parent of both HTML and XML. Because SGML documents are standard text files, SGML provides cross-system portability. When technology is rapidly changing, SGML provides a stable platform for managing data exchange. Furthermore, SGML files can be transformed for publication in a variety of media. The use of SGML preserves textual information independent of how and when it is presented. Organizations reap long-term benefits when they can store documents in a single, independent standard that can then be converted for display in any desired media.
SGML has three major advantages for data management:
- Reuse: Information can be created once and reused many times.
- Flexibility: SGML documents can be published in any format. The same content can be printed, presented on the Web, or delivered with a text synthesis. Because SGML is content-oriented, presentation decisions can be delayed until the output format is decided.
- Revision: SGML supports revision and version control. With content version control, a firm can readily track the changes in documents.
A short section of SGML demonstrates clearly the features and strength of SGML (see Table 2). The tags surrounding a chunk of text describe its meaning and thus support presentation and retrieval. For example, the pair of tags <airline> and </airline> surrounding “Delta” identify the airline making the flight.
Table 2: SGML example
<flight><airline>Delta</airline><flightno>22</flightno><origin>Atlanta</origin><destination>Paris</destination> <departure>5:40pm</departure><arrival>8:10am</arrival></flight> |
The preceding SGML code can be presented in several ways by applying a stylesheet to the file. For example, it might appear as
Delta flight 22 flies from Atlanta to Paris leaving 5:40pm and arriving 8:10am
or as
Airline | Flight | Origin | Destination | Dep | Arr |
Delta | 22 | Atlanta | Paris | 5:40pm | 8:10am |
If the data are stored in HTML format (as in Table 3), then the meaning of the data has to be inferred by the reader. This is generally quite easy for humans, but impossible for machines. Furthermore, the presentation format is fixed and can only be altered by rewriting the HTML.
Table 3: HTML example
1 |
<html>
|
Meaning and presentation should be independent, and this is an important reason why SGML is more powerful than HTML.
Section summary: SGML is a markup language that defines the structure of documents and is preferred to HTML as it can be transformed into a variety of media. |
Ashok Daniel
XML
The purpose of eXtensible Markup Language (XML) is to make information self-describing. Based on SGML, XML is designed to support electronic commerce. The definition of XML, completed in early 1998 by the World Wide Web Consortium (W3C), is a meta language—a language to generate languages. XML should steadily replace HTML on many Web sites because of some key advantages. The major differences between XML and HTML are captured in the following table.
XML | HTML |
Information content | Information presentation |
Extendable set of tags | Fixed set of tags |
Data exchange language | Data presentation language |
Greater hypertext linking | Limited hypertext linking |
The eXtensible in XML means that a new data exchange language can be created by defining its structure and tags. For example, the OpenGIS Consortium designed a Geographic Markup Language (GML) to facilitate the electronic exchange of geographic information. Similarly, the Open Tourism Consortium is working on the definition of TourML to support exchange of tourism information. Another good example of XML in action is NewsML™.
In this text, we will cover all the features of XML, but at this point let us introduce a few of the key features.
Key features of XML
- Elements have both an opening and a closing tag
- Elements follow a strict hierarchy with only one root element
- Elements cannot overlap other elements
- Element names must obey XML naming conventions
- XML is case sensitive
XML will improve the efficiency of data exchange in several important ways, which include
- write once and format many times: Once an XML file is created it can be presented in multiple ways by the application of an XML stylesheet. For instance, the information might be displayed on a Web page or printed in a book.
- hardware and software independence: XML files are standard text files, which means they can be read by any operating system.
- write once and exchange many times: Provided an industry agrees on a XML standard for data exchange, then data can be readily exchanged between all members using this standard.
- Faster and more precise Web searching: When the meaning of information can be determined by a computer (by reading the tags), Web searching will be enhanced. For example, if you are looking for a specific book title, it is far more efficient for a computer to search for text between the pair of tags <booktitle> and </booktitle> than search an entire file looking for the title. Furthermore, spurious results should be eliminated.
The major XML elements
The major XML elements are
- XML schema: A schema is an XML file that describes the structure of a document and its tags.
- XML file: An XML file is a file containing XML code.
- XML stylesheet: A stylesheet is an XML containing formatting instructions for an XML file.
In the next few chapters, you will learn how to create and use each of these elements of XML.
XML at United Parcel Service (UPS)
“UPS is a service company and it is all about scale and speed.” says Geoff Chalmers, Project Leader at UPS eSolutions Department. In 2003, UPS had $33.5 billion annual revenue and 357,000 employees worldwide. Six percent of the United States Gross Domestic Product (GDP) on any given day is in UPS’ system.
|
Section summary: XML is a convertible meta language which supports the electronic commerce by sticking to certain rules. |
Creating a markup file
Any text editor can be used to create a markup file (e.g. an HTML file). In this book, we use the text editor within NetBeans, an open source Integrated Development Environment (IDE) for Java, because NetBeans supports editing and validation of XML files. Before proceeding, you should download and install NetBeans from www.NetBeans.org. When the install is complete, take the following actions.
NetBeans IDE 4.0 Beta 2 instruction
- Launch NetBeans
- Make yourself familiar with the IDE by opening Help > Help Contents and reading the material in Getting Started
- Create a new projfect by File > New Project.. (Ctrl+Shift+N)
- Under "Choose a Project:" select under Catergories: Standard
- Select under Projects: Java Application
- Hit Next
- Name the project name appropriately
- Save to the appropriate location
- Un-check "Set as Main Project" and un-check "Create Main Class"
- Hit Finish
- Create a new file by File > New File (Ctrl+N)
- Select the appropriate project
- Under "Choose a File Type" select under Catergories: XML
- Select under File Types: XML Document
- Name the project name appropriately
- Save to the appropriate location
- Hit Next
- Select "Well-Formed Document"
- Hit Finish
- You should see the following skeleton XML file
<?xml version="1.0" encoding="UTF-8"?> |
- Our goal is to create a generic markup file, rather than an XML file, so replace the lines in the skeleton XML file with the four lines in Table 1
- Check the markup file is well-formed by clicking on the green triangle (Alt+F9) on the tool bar. It should pass the check
- Delete the tag </city> and check the file again. This time you should get an error indicating that an end tag is missing
Section summary: As NetBeans sticks to XML rules it is favored for creating a markup file. |
Exercise 1
Use NetBeans to create a markup file describing a restaurant. The markup should identify the name and address of the restaurant and the type of food or foods in which it specializes.
Exercise 2
Let's assume we want to create a personal file for a smaller company. What kind of data do we have? Analyse the data from following table. Our goal is to transform it into a markup file. Name of the company: 'Exercises inc.'
firstname | lastname | street | city | country | date_of_birth | phone number | department | title |
Tobias | Boeswald | Laxenburger str. 384 | Vienna | Austria | 02/07/1974 | 0431/3445346 | finance and accounting | Accountant |
Dimitri | Felber | Neuburger str. 19a | Passau | Germany | 05/12/1967 | 00498510/523456 | finance and accounting | CFO |
Stefan | Meyer | Breite Gasse 10 | Nuremberg | Germany | 10/09/1972 | 00499110/45365 | human resources | HR Manager |
All data is fictitious. Any similarity between the people described and any real person is purely coincidental. :)
Why is this book not an XML document?
If you have accepted the ideas presented in this chapter, the question is very pertinent. The simple answer is that we have been unable to find the technology to support the creation of an open text book in XML. We need several pieces of technology
- An XML language for describing a book. DocBook is such a language, but the structure of a book is quite complex, and DocBook (reflecting this complexity) cannot be quickly mastered
- A Wiki that works with a language such as DocBook
- A XML stylesheet that converts XML into HTML for displaying the book's content
There is a project to create WikiMl (Wiki MarkupLanguage), and this might be used at some point.
References
Initiating author Richard T. Watson, University of Georgia