Thursday, April 05, 2007

Native XML Support in DB2 Databases [DB2 9 for z/OS]

One of the biggest technological advances in DB2 V9 is the ability to combine the management of structured and unstructured data. Basically, V9 will allow you to store data as native XML. This capability has already been introduced into V9 of DB2 for Linux, Unix, and Windows. Many of you may well ask “Hey, what’s the big deal here? Can’t we already use the XML Extender and store XML data in DB2 prior to V9?” Yes, but V9 changes the game. You will be able to search and analyze structured data in a relational data repository and unstructured data in an XML repository without the need to reformat it. So your regular “relational” data gets stored as always; and XML data gets stored in its native format without the need to shove it into a CLOB or shred it into “relational” columns. The approach is novel in that DB2 will now support native XML via dual storage engines – the traditional SQL/relational engine and a new XML engine. DB2 9 for z/OS handles XML as a new data type that is stored in a natural hierarchy - different from relational data. For those of you not familiar with XML, you need to know that there are big differences between XML data and typical DB2 data. Foremost among these differences is that XML data is hierarchical, whereas “relational” DB2 data is basically “flat.” Additionally, XML data is self-describing. XML tags identify and name the data elements in the XML document. This capability concentrates both the data and its structure into a single document. So, in essence, the XML document becomes self-describing. This is important to keep in mind because a single XML document can have many different types of data, whereas “relational” DB2 data is defined in the system catalog by its column definition. And all data in the same column must have the same data type (e.g. you cannot store a name in an integer column). Finally, XML data is ordered, whereas “relational” DB2 data is not. The order in which data items are specified in the XML document is relevant. There is often no other way to specify order within an XML document. For relational data, the order of the rows is not guaranteed unless you specify an ORDER BY clause on one or more columns. OK, now, just how would you support XML data in DB2 V9 then? Think of XML as just another data type. You would use the XML data type in a CREATE TABLE statement to define a column to be of type XML. Each column of type XML can hold one XML document for every row of the table. Even though the XML documents are logically associated with a row, XML and “relational” columns are stored differently. The “relational” columns are stored in the traditional structures we all know and love. The XML data is stored in hierarchical structures. Don’t let that scare you. IBM has seamlessly integrated XML with relational data to simplify application development while optimizing search performance with highly optimized XML indexes. Here is a quick example walkthrough from the DB2 XML Guide manual that creates a simple table with an XML column. First, as with triggers, when you create tables with XML in SPUFI be sure to set the SQL terminator to a character other than a semicolon, for example, the pound sign (#). This is done so that your SQL can have embedded semicolons. Also, you’ll probably want to set CAPS OFF in SPUFI to preserve lower case. Then, create a table like this: CREATE TABLE MYCUSTOMER (CID BIGINT, INFO XML) # This creates a two columns table, the first column as a big integer and the second for the XML data. Next, we’ll build an index over XML data. We will assume that the XML documents to be stored in the INFO column will have a root element named customerinfo with an attribute named Cid. So, here is the DDL for the unique index on the Cid attribute:
CREATE UNIQUE INDEX MYCUT_CID_XMLIDX ON MYCUSTOMER(INFO)
GENERATE KEY USING XMLPATTERN
‘declare default element namespace
   "http://posample.org"; /customerinfo/@Cid’
AS SQL DECFLOAT
#
The XML pattern defining the index is case-sensitive. The element and attribute names in the XML pattern must match the element and attribute names in the XML documents exactly. Now we can insert a couple of XML documents into the INFO column, such as:
INSERT INTO MYCUSTOMER (CID, INFO) VALUES (1000,
’<customerinfo xmlns="http://posample.org" cid="1000">
  <name>Kathy Smith</name>
  <addr country="Canada">
    <street<5 Rosewood</street>
    <city>Toronto</city>
    <prov-state<Ontario</PROV-STATE>
    <pcode-zip<M6W 1E6</PCODE-ZIP>
  </addr>
  <phone type="work">416-555-1358</phone>
 </customerinfo>’)
#
INSERT INTO MYCUSTOMER (CID, INFO) VALUES (1002,
’<customerinfo xmlns="http://posample.org" cid="1002">
  <name>Jim Noodle</name>
  <addr country="Canada">
    <street>25 EastCreek</street>
    <city>Markham</city>
    <prov-state>Ontario</PROV-STATE>
    <pcode-zip>N9C 3T6</PCODE-ZIP>
  </addr>
  <phone type="work">905-555-7258</phone>
  <phone type="cell">905-555-7254</phone>
 </customerinfo>’)
#
Then you can issue a SELECT statement against this table and thereby verify that the XML documents were successfully inserted. For example: SELECT CID, INFO FROM MYCUSTOMER # V9 also supports XPath to query elements within an XML document, as well as catalog extensions to support definitions of XML schemas. Furthermore, the IBM DB2 utilities have been extended such that they can be used to administer XML data, too. To my mind, though, one of the problems with XML in DB2 9 for z/OS is the lack of support for XQuery. XQuery is an XML query language capable of traversing XML documents. Just like SQL is the query language for native DB2 data, XQuery is the query language for native XML data. DB2 9 for Linux, Unix, and Windows supports XQuery, but DB2 9 for z/OS does not. For an independent tutorial on XQuery, click on this link or for an IBM tutorial on using XQuery in DB2 LUW click on this link instead. So, how do you retrieve XML data using DB2 9 for z/OS? You can use SQL to retrieve entire XML documents from XML columns just like you would retrieve any other column. But if you need to retrieve portions of that XML document you will need to specify XPath expressions, through SQL with XML extensions. For an independent tutorial on XPath, click on this link . Here is an example of using XPath to identify data within our XML data: DELETE FROM MYCUSTOMER WHERE XMLEXISTS ( ’declare default element namespace "http://posample.org"; /customerinfo/phone[@type="cell"]’ PASSING INFO) # This should DELETE any XML document that has cell phone information, and for the purposes of this example, that would be CID 1002. I do not wish to go into any detailed description of XPath in this blog, but you can use XML functions with XPath expressions to traverse the XML document for data. One final note: some of the IBM documentation could be clearer. For example, I take exception with this paragraph lifted directly out of the “What’s New” manual (GC18-9856-00): “Support for XML capabilities and functions span the entire DB2 family. Version 8 of DB2 for z/OS and Version 8 of DB2 for Linux, Unix, and Windows provide basic support for storing, retrieving, and querying XML documents. DB2 9 for Linux, UNIX and Windows continues the work by delivering rich support of XML, including an XQuery interface to the data. Now, DB2 V9.1 for z/OS expands on similar support by delivering seamless integration of XML data and relational data in the DB2 database.” Anyone reading that paragraph would be completely justified in expecting DB2 V9 for z/OS to include XQuery support. It seems to have been written using intentionally misleading wording in order to avoid admitting that XQuery is not supported on z/OS. At least, that is what it seems like to me, I could be wrong. I’m also interested in how many folks out there in DB2-mainframe-land expect to use the XML capabilities of DB2 for z/OS? Please sign in and leave a comment expressing whether or not you plan to use DB2’s XML support. Thanks, and that is all for today. Keep an eye out for future DB2 9 for z/OS related posts as I plan to continue adding to this series on new V9 features over the course of the next month or so (at least). Cheers!

1 comment:

bb14742 said...

Craig:

The company I work for has various locations which utilize DB2 LUW, z/OS, Oracle and so forth. Most of these shops are encouraged to use DB2 either on LUW or z/OS. In the shop where I work we predominantly use DB2 for z/OS. We definitely plan to use XML in DB2 for z/OS. I recently attended IDUG North America and while it was informative and beneficial I was hoping to acquire more information on DB2 V9 and XML.

We already store some XML content in VARCHAR columns within DB2 V8. However, this approach has its limitations.

It is not really a question of whether we use the XML features in DB2 V9, but how. I’m a data modeler and these new features essentially allow a table or multiple tables (okay they’re hierarchical, but essentially still tables) to be stored inside the column of a table. It will be very easy to misuse these new features from a data modeling perspective. The worst case scenario being a situation with a table which contains a primary key and an XML column. The XML column would contain all of the table columns and various relationships. This is a topic with which I would like to read more about. I do feel that these new features can be very powerful though.