XML/Introduzione: differenze tra le versioni

< XML
Contenuto cancellato Contenuto aggiunto
Nessun oggetto della modifica
(Nessuna differenza)

Versione delle 12:52, 13 ott 2004

Programmare in XML


Obiettivi di apprendimento

Alla fine di questo capitolo, sarete in grado di

  • definire gli scopidistinti di SGML, HTML e XML
  • creare un semplice documento XML usando NetBeans

Introduzione

There are four central problems in data management: capture, storage, retrieval, and exchange. The purpose of this book is to address XML, a technology for managing data exchange. Data exchange has long been an issue, but the Internet has elevated its importance. Electronic data interchange (EDI), the traditional data exchange standard for large organizations, is giving way to XML, which is likely to become the data exchange standard for all organizations, irrespective of size.

EDI supports the electronic exchange of standard business documents. A structured format is used to exchange common business documents (e.g., invoices and shipping orders) between trading partners. In contrast to the free form of e-mail messages, EDI supports the exchange of repetitive, routine business transactions. Standards mean that routine electronic transactions can be concise and precise. The main standard used in the United States and Canada is known as X.12, and the major international standard is EDIFACT. Firms following the same standard can electronically share data.

The Internet is a global network potentially accessible by nearly every firm with communication costs typically less than with traditional EDI. Consequently, the Internet has become the electronic transport path of choice between trading partners. The simplest approach is to use the Internet as a means of transporting EDI documents. Another approach is to reexamine the technology of data exchange, since EDI was developed in the 1960s. A result of this rethinking is XML, but before considering XML, we need to learn about SGML, the parent of XML.

SGML

For a typical U.S. firm, it is estimated that document management consumes up to 15 percent of its revenue, nearly 25 percent of its labor costs, and anywhere between 10 and 60 percent of an office worker’s time. The Standard Generalized Markup Language (SGML) is designed to reduce the cost and increase the efficiency of document management.

A markup language embeds information about a document within the document's text. In the following example, the markup tags indicate that the text contains details of a city. Note also that the city's name, state, and population are identified by specific tags. Thus, the reader, a person or computer, is left in no doubt as to meaning of Athens, Georgia, or 100,000. Note also the latitude and location of the city are explicitly identified with appropriate tags. SGML’s usefulness is based upon both recording text and the meaning of that text.

Table 1: Markup language

<city><cityname>Athens</cityname> is located 60 miles northeast of Atlanta, <statename> Georgia</statename>. Home of the University of Georgia, it has a population of just over <population> 100,000</population>. Athens' location is <latitude>33º 57' 39" N</latitude>, <longitude> 83º 22' 42" W</longitude></city>

SGML is a vendor-independent International Standard (ISO 8879) that defines the structure of documents. Developed in 1986 as a meta language, SGML is the parent of both HTML and XML. Because SGML documents are standard text files, SGML provides cross-system portability. When technology is rapidly changing, SGML provides a stable platform for managing data exchange. Furthermore, SGML files can be transformed for publication in a variety of media. The use of SGML preserves textual information independent of how and when it is presented. Organizations reap long-term benefits when they can store documents in a single, independent standard that can then be converted for display in any desired media.

SGML has three major advantages for data management:

  • Reuse: Information can be created once and reused many times.
  • Flexibility: SGML documents can be published in any format. The same content can be printed, presented on the Web, or delivered with a text synthesis. Because SGML is content-oriented, presentation decisions can be delayed until the output format is decided.
  • Revision: SGML supports revision and version control. With content version control, a firm can readily track the changes in documents.

A short section of SGML demonstrates clearly the features and strength of SGML (see Table 2). The tags surrounding a chunk of text describe its meaning and thus support presentation and retrieval. For example, the pair of tags <airline> and </airline> surrounding “Delta” identify the airline making the flight.

Table 2: SGML example

<flight><airline>Delta</airline><flightno>22</flightno><origin>Atlanta</origin><destination>Paris</destination> <departure>5:40pm</departure><arrival>8:10am</arrival></flight>

The preceding SGML code can be presented in several ways by applying a stylesheet to the file. For example, it might appear as

Delta flight 22 flies from Atlanta to Paris leaving 5:40pm and arriving 8:10am

or as

Airline Flight Origin Destination Dep Arr
Delta 22 Atlanta Paris 5:40pm 8:10am

If the data are stored in HTML format (as in Table 3), then the meaning of the data has to be inferred by the reader. This is generally quite easy for humans, but impossible for machines. Furthermore, the presentation format is fixed and can only be altered by rewriting the HTML.

Table 3: HTML example

1
2
3
4
5

<html>
 <body>
Delta flight 22 flies from Atlanta to Paris leaving 5:40pm and arriving 8:10am
 </body>
</html>

Meaning and presentation should be independent, and this is an important reason why SGML is more powerful than HTML.

Section summary: SGML is a markup language that defines the structure of documents and is preferred to HTML as it can be transformed into a variety of media.


Ashok Daniel

XML

The purpose of eXtensible Markup Language (XML) is to make information self-describing. Based on SGML, XML is designed to support electronic commerce. The definition of XML, completed in early 1998 by the World Wide Web Consortium (W3C), is a meta language—a language to generate languages. XML should steadily replace HTML on many Web sites because of some key advantages. The major differences between XML and HTML are captured in the following table.

XML HTML
Information content Information presentation
Extendable set of tags Fixed set of tags
Data exchange language Data presentation language
Greater hypertext linking Limited hypertext linking

The eXtensible in XML means that a new data exchange language can be created by defining its structure and tags. For example, the OpenGIS Consortium designed a Geographic Markup Language (GML) to facilitate the electronic exchange of geographic information. Similarly, the Open Tourism Consortium is working on the definition of TourML to support exchange of tourism information. Another good example of XML in action is NewsML™.

In this text, we will cover all the features of XML, but at this point let us introduce a few of the key features.

Key features of XML

  • Elements have both an opening and a closing tag
  • Elements follow a strict hierarchy with only one root element
  • Elements cannot overlap other elements
  • Element names must obey XML naming conventions
  • XML is case sensitive

XML will improve the efficiency of data exchange in several important ways, which include

  • write once and format many times: Once an XML file is created it can be presented in multiple ways by the application of an XML stylesheet. For instance, the information might be displayed on a Web page or printed in a book.
  • hardware and software independence: XML files are standard text files, which means they can be read by any operating system.
  • write once and exchange many times: Provided an industry agrees on a XML standard for data exchange, then data can be readily exchanged between all members using this standard.
  • Faster and more precise Web searching: When the meaning of information can be determined by a computer (by reading the tags), Web searching will be enhanced. For example, if you are looking for a specific book title, it is far more efficient for a computer to search for text between the pair of tags <booktitle> and </booktitle> than search an entire file looking for the title. Furthermore, spurious results should be eliminated.

The major XML elements

The major XML elements are

  • XML schema: A schema is an XML file that describes the structure of a document and its tags.
  • XML file: An XML file is a file containing XML code.
  • XML stylesheet: A stylesheet is an XML containing formatting instructions for an XML file.

In the next few chapters, you will learn how to create and use each of these elements of XML.

XML at United Parcel Service (UPS)

“UPS is a service company and it is all about scale and speed.” says Geoff Chalmers, Project Leader at UPS eSolutions Department. In 2003, UPS had $33.5 billion annual revenue and 357,000 employees worldwide. Six percent of the United States Gross Domestic Product (GDP) on any given day is in UPS’ system.

UPS uses technology extensively. The Information Systems department employs 4,000 people. The Company Web site has 166 different country home pages and is supported by 44 applications.

UPS delivers around 13 million packages everyday, and customers can track these shipments via the UPS Web site, which receives around 200 million hits daily. Nineteen of the applications within ups.com are XML OnLine Tool (Web services) applications.

UPS’s online tools are developed specifically to be integrated with customers’ applications. This makes the customer’s task simpler, easier, and faster. UPS verified the importance of simplicity and speed, via ‘CampusShip’, a product that has been one of the UPS’s most successful in the last 10 years. UPS CampusShip® is a Web-based, UPS-hosted shipping system. Using an Internet connection, employees can ship their own packages and letters from any desktop, while management maintains overall control of shipping activities. UPS CampusShip® allows simultaneous shipper autonomy and managerial cost-control within the organization. This product has been successful because no installation or software maintenance is required and it is quick to implement. XML Online Tools enabled cheap and fast evolution of the CampusShip®.



UPS favors XML especially because it is agnostic; platform and language independent. These features make XML very flexible and powerful. It is also decoupled and scalable. XML has enabled UPS to target a broader market and reduce customer interaction, and thus the cost of customer service. Another positive feature of XML is that it is backward compatible. The adoption of XML has reduced maintenance, implementation, and usage costs significantly within UPS.

However these advantages don’t come without a price. “XML is inefficient in so many ways” says Geoff Chalmers. XML unfortunately takes more CPU and bandwidth than the other technologies. Yet bandwidth and CPU are cheap and getting cheaper everyday, so this is a gradually disappearing problem.

Nevertheless, Geoff Chalmers also thinks that XML doesn’t work well in databases. He says that it is too wordy and it is an exchange medium rather than a database medium. There were some early attempts to tightly integrate XML and databases. Because databases do supply structure and identification to data as does XML, the value-add of XML-database integration is limited to applying hierarchical structure. On the other hand, if data is to be stored as a blob, then XML makes sense. Another problem that he points out about XML is that business rules cannot be expressed in XML schemas.

Finally, raw XML programming and debugging can be challenging. Therefore, UPS’s enterprise customers are starting to explore the code generators and embedded facilities to be found in .Net and BEA. However hand coding by experienced in-house engineers is a must for the high availability, scalability, and performance that UPS requires for the UPS OnLine Tools.


Section summary: XML is a convertible meta language which supports the electronic commerce by sticking to certain rules.

Creating a markup file

Any text editor can be used to create a markup file (e.g. an HTML file). In this book, we use the text editor within NetBeans, an open source Integrated Development Environment (IDE) for Java, because NetBeans supports editing and validation of XML files. Before proceeding, you should download and install NetBeans from www.NetBeans.org. When the install is complete, take the following actions.

NetBeans IDE 4.0 Beta 2 instruction

  1. Launch NetBeans
  2. Make yourself familiar with the IDE by opening Help > Help Contents and reading the material in Getting Started
  3. Create a new projfect by File > New Project.. (Ctrl+Shift+N)
  4. Under "Choose a Project:" select under Catergories: Standard
  5. Select under Projects: Java Application
  6. Hit Next
  7. Name the project name appropriately
  8. Save to the appropriate location
  9. Un-check "Set as Main Project" and un-check "Create Main Class"
  10. Hit Finish
  11. Create a new file by File > New File (Ctrl+N)
  12. Select the appropriate project
  13. Under "Choose a File Type" select under Catergories: XML
  14. Select under File Types: XML Document
  15. Name the project name appropriately
  16. Save to the appropriate location
  17. Hit Next
  18. Select "Well-Formed Document"
  19. Hit Finish
  20. You should see the following skeleton XML file
<?xml version="1.0" encoding="UTF-8"?>


<!--
Document  : FILE_NAME.xml
Created on : October 5, 2004, 4:37 PM
Author  : Brad
Description:
Purpose of the document follows.
-->

<root>

</root>


  1. Our goal is to create a generic markup file, rather than an XML file, so replace the lines in the skeleton XML file with the four lines in Table 1
  2. Check the markup file is well-formed by clicking on the green triangle (Alt+F9) on the tool bar. It should pass the check
  3. Delete the tag </city> and check the file again. This time you should get an error indicating that an end tag is missing

Section summary: As NetBeans sticks to XML rules it is favored for creating a markup file.

Exercise 1

Use NetBeans to create a markup file describing a restaurant. The markup should identify the name and address of the restaurant and the type of food or foods in which it specializes.


Exercise 2

Let's assume we want to create a personal file for a smaller company. What kind of data do we have? Analyse the data from following table. Our goal is to transform it into a markup file. Name of the company: 'Exercises inc.'

firstname lastname street city country date_of_birth phone number department title
Tobias Boeswald Laxenburger str. 384 Vienna Austria 02/07/1974 0431/3445346 finance and accounting Accountant
Dimitri Felber Neuburger str. 19a Passau Germany 05/12/1967 00498510/523456 finance and accounting CFO
Stefan Meyer Breite Gasse 10 Nuremberg Germany 10/09/1972 00499110/45365 human resources HR Manager

All data is fictitious. Any similarity between the people described and any real person is purely coincidental. :)

Why is this book not an XML document?

If you have accepted the ideas presented in this chapter, the question is very pertinent. The simple answer is that we have been unable to find the technology to support the creation of an open text book in XML. We need several pieces of technology

  • An XML language for describing a book. DocBook is such a language, but the structure of a book is quite complex, and DocBook (reflecting this complexity) cannot be quickly mastered
  • A Wiki that works with a language such as DocBook
  • A XML stylesheet that converts XML into HTML for displaying the book's content

There is a project to create WikiMl (Wiki MarkupLanguage), and this might be used at some point.

References

Initiating author Richard T. Watson, University of Georgia