XML::Edifact - an approach towards XML/EDI as a prototype in perl release 0.35 - normalisation, namespaces, xml2edi Michael Koehne, ( kraehe@bakunin.north.de ) v0.35 release XML::Edifact is a set of perl scripts, hopefully becoming a module, for translating EDIFACT into XML. This 0.35 version is mainly a bug fix, thanks to Detlef Lammermann from Dr. Materna GmbH, who found that ??' was missinterpreted. ______________________________________________________________________ Table of Contents: 1. Introduction 2. Release Notes: 2.1. Edi2SGML-0.1: About the beauty of plain text 2.2. XML-Edifact-0.2: Its a hard work to cook a second version. 2.3. XML-Edifact-0.3x: About normalisation, namespaces and xml2edi 3. Installation 4. Known Bugs 4.1. Double namespace declarations 4.2. Stating level in Syntax identifier. 4.3. Explicit Indication of Nesting 4.4. XML::Edifact is slow! 5. Roadmap 6. Legal stuff 7. Download ______________________________________________________________________ 1. Introduction EDIFACT often called " nightmare of paper less office " once you show a programmer the standard draft. Those 2700 pages of horror full advisory board English has cursed many programmers with headaches. EDIFACT is trying the impossible: a single form for the real world. Orders, invoices, fright papers, ..., always look different, if they come from different companies. EDIFACT tries to fulfill all needs of commercial messages regardless of branch and origin. Of course those 99% real world is neither simple nor complete. Nevertheless its important for the top companies and their suppliers, you know those who can pay a mainframe and a pack of gurus, and in use for years. XML/EDI is will provide a simpler (KISS) format that can be translated from and into EDIFACT, to allow smaller companies to avoid slaughtering forests and retyping stupid lines into a computer keyboard printed by other computers. This is NOT XML/EDI, its certainly not KISS. The edifact03.dtd reflects the original words of the EDIFACT standard as close as possible on a segment, composite and element level. This DTD simplifies EDI in so much as it doesnt distinct between e.g. INVOICE or PRICAT but only defines a generic message type called edifact:message. The benefit is of course that its possible to convert any EDI message into edifact. The drawback is that the dtd is realy relaxed. Validation of EDIFACT message design can therefore not be done by a validating XML parser. Message designers will still need knowledge about EDIFACT message design and EDIFACT tools. But once the message is designed its simpler to read it with XML. 2. Release Notes: 2.1. Edi2SGML-0.1: About the beauty of plain text Standards should be based on standards. EDIFACT is based on ASCII and documentation is available from WWW.Premenos.Com as plain text. Well the original contains some PCDOS characters. I took the freedom, to replace them with ASCII in this distribution to improve readability. I don't talk about human readability here. A friend at SAP joked that plain paper is the only platform independent format in that case. But I disliked to retype them. And plain text is more flexible, as I'm a programmer. Unlike the 0.1 distribution, following distributions will only contain those documents I need to parse by the scripts. Download the 0.1 for a complete set, or surf at Premenos. 2.2. XML-Edifact-0.2: Its a hard work to cook a second version. As usual. Second versions claim to be better documented and tested, but the truth - they contain more features. So talk about features: First of all: Its looking like a module. "use strict" and the package concept is a usefull thing. But it'll take a lot of RTFM for me to understand the perl way of doing it. The XML/Edifact.pm doesnt export anything, and its not even neccessary to "perl Makefile.PL; make install". A 0.2 version is not intendet to become installed, its a test case. So talk about the test case: Run ./bin/make_test.sh from here, and anything should be fine. Still it need some RTFM for me to understand the perl way of regession test. But the ./bin/make_test.sh is the one this version offers ,-) I'm now using a tied hash for speeding startup. I've deceided to use SDBM as this DBM comes with any perl, and a small DBM is better in this case. I've provided a document type definition. And its now possible to use a validating parser like SP from James Clark. You may also notice the renaming from Edi2SGML to XML::Edifact. This namechange reflects that my script is now producing XML and not SGML, and the name should point where in cpan hirachy this package belongs. 2.3. XML-Edifact-0.3x: About normalisation, namespaces and xml2edi You may notice the major change in the DBM design. While the old DBM files had been modeled closely to the batch directory. This version has been partly normalised to improve coding. Its also denormalised for some perlish reasons. Unloading of this DBM into a relational database would be possible with varchars, but the semantics of the 2nd element in segments and composite could only be expressed with some wired object relational databases like PostgreSQL. Also the DTD changed for namespace reason. The 0.2 need to add the word literal, where element names clashed segment names of the standard. And it droped the composite informations. Now trsd:party.name means the segment, while tred:party.name points to the element. This allows to parse the XML message to produce a EDI message without an backtracking parser. The event based parser used for xml2edi is quite new, and certainly contain some bugs. Please dig around your real life messages, translate them with edi2xml, back with xml2edi and compare the original with the double translation. I've tried a robust solution, that doesnt croak with codes from the unknown namespace, I hope. Version 0.30 and 0.31 used edicooked:message as namespace, versions 0.32 and up will use edifact:message for the main namespace. The technical reason is quite simple. The namespace prefix of a message does not mean anything. Its only a shorthand for the provided URI in the xmlns attribute. So any distinct XML message can claim to be in the edifact: namespace, if the URI is distinct. So if other projects starts becoming implemented, they can claim to be in the edifact: namespace for the same right. Version 0.33 first of all solves a bug shown up with xml2edi and a TeleOrdering message translated by edi2xml. I just did forgot to encode less than and ampersand, if they occure as translation in a code list. So NAD+OB+0091987:160:16' will now translated using Dun & Bradstreet, which is right. Two other improvements are for major notice. The brand new Perl5 version 005.60 contains a profiler, and hunting the hot spots and optimising the SDBM by further denormalisation improved performance of edi2xml by factor 12. I hope nobody did use the SDBM internals so far. The last major improvement is, that I'm getting familar with ExtUtils::MakeMaker, File::Spec and friends. Version 0.33 is the first that did install - at least on my Linux box :-) Version 0.34 introduced coding of UN/EDIFACT code list extensions by XML-Edifact namespace migration. Version 0.34 fixed a bug concerning the release indicatior. As a minor improvement the edi2xml and xml2edi scripts now have pod documentation. A last note about change of 0.2 to 0.30. Treat this number as 0.3.0 translated to perl canon. The 0.3 is not finished, coming versions claiming to be any 0.3x will be step stones to what I think the 30% XML::Edifact solution should contain. 3. Installation I've included my modified documents, so others can be able to rebuild the DBM files. You may need a Unix like system because of newline conventions. $ perl Makefile.PL I know I should check for those 99 possible places, but I prefer to ask :-) URL for public documents [http://localhost/xml-edifact] Directory on this system [/usr/local/apache/htdocs/xml-edifact] Writing Makefile for XML::Edifact $ make perl perl Makefile.PL will first ask two questions. The reason is that XML::Edifact wants to install his document type definition on a web server to allow validation XML parser to grep the DTD. If you dont have a web server, anwer "." as the URL and "/tmp/xml- edifact" as a directoryname. You may change those decissions later by reperling the Makefile.PL, or by editing the XML::Edifact::Config module in your SITE_PERL. Make will take a while and you hopefully have a working database. This database covers the 96b version of the UN/EDIFACT batch directory and will become installed as XML::Edifact::d96b later. $ make test The regression test will translate any .edi file found in the examples directory to xml and translate the xml back to EDIFACT. The result should not change. $ make install This will install the XML::Edifact module the D96B batch directory, various files for the URL and two scripts: edi2xml and xml2edi You can now try own UN/EDIFACT files. I really want to know how your EDI messages look like, do they break anything, what about your code list extension, ... Testing different real examples should show some bugs, I hav'nt thought about. Think about the O'Reilly invoice or the Dubbel:Test and you should catch the clue. I've tried to implement the UNA right, but this may need some additional debugging. Take a look at the difference between the editeur.edi file from Frankfurt and the Springer message. The last one is using newline as a 9th character in UNA, so its nearly human readable. A last word - I hope this complex installlation will work on most Unix look likes, but I'm quite sure that it'll break on Windows and Mac. If you have such a system, and problems during installation, drop me a mail. You are granted my help, as I need your help to make the installation portabel across different platforms. 4. Known Bugs 4.1. Double namespace declarations Namespace declaration was redefined in January 1999. XML::Edifact 0.30 produced both the old and the new declarations. XML::Edifact 0.31 droped the depreciated declartions! If you have an old browser, you may have to download XML::Edifact 0.30 and to edit the actual XML::Edifact. Search for HERE_ and adopt the headers to your browsers preferences. 4.2. Stating level in Syntax identifier. This has to be parsed. The stating level in EDIFACT speak is called charset encoding in XML speak, and its of course important if you thing about non US/UK products. See un_edifact/unsl. 4.3. Explicit Indication of Nesting This has not been coded yet, as no example messsages are available to me. 4.4. XML::Edifact is slow! The example real life message teleord.edi still needs about 36 seconds on a Sun3/60 running NetBSD. Even as newer computers are faster, XML::Edifact would not be able to handle the daily batches of large UN/EDIFACT routers like TeleOdering UK. The solution of this problem will become delayed till version 1.2, when parts of the module will be recoded in C. 5. Roadmap I'm using even and odd numbering to distinct from stable and experimental version. Well 0.2 was not as stable as an even number suggests. And I hope this 0.3x is stable enough as as often a third version, will be the first usefull one. Be warned: Anythink here is pure vaporware. I'm writing XML::Edifact in my spare time, and I hope to complete one version per month. 0.3x This version is under development: It should integrate better into the XML::Parser environment, and use some XML::Parser to translate XML::Edifact-0.3x messages back into UN/EDIFACT. Only even numbers 0.302468 can be cound on CPAN. Odd versions are published by eMail only. As a warning different 0.31 exist. Some eMail's I got, caused imediate code changes and a reply to test them. If you receive a 0.3-913579 file by eMail: Do not distribute it widely, those versions are internal only. 0.4x This version will focus on portability. While Perl ensures portability across the unix'es, MacOS and Win32 will cause some problems. The 0.4 version will also be the first one intended to become installed. As installation also means configuration of non Perlish paths e.g. for webserver, mime.types, mailcap, dtds and databases, XML::Config.pm will be discussed in the perlxml list. 0.5x The next important step will be a reverse engineering of the document type definition of the original EDI standard draft. This version will provide segment groups for defined document types like orders and invoices. Most important will be the introduction of a XML format for defining code list extensions. This format will probately some RDF. 0.6x Stabilisation by disscussion and consens about the XML DTDs introduced with 0.5. 0.7x EdiCooked is far from being KISS. This release will try on a smarter DTD called EdiLean. EdiLean will focus on PRICAT, ORDERS, ORDRSP, ORDCHG and INVOICE. If a consens about a KISS XML/EDI already exist, EdiLean will try to implement it. 0.8x Stabilisation by disscussion and consens about the XML DTDs introduced with 0.7. 0.9x Its important for me that authentication and authorisation will be provided before I call it final 1.0. Some Edifact messages contain medical informations (MED*), other contain personal informations (JOB*). Most messages contain viable information for running a bussiness. Only cryptography on a document level would preserve authentication and authorisation once a message stored on a disk. Alf O. Watt ( alfwatt@pacbell.net ) proposed a simple solution using namespaces and processing instructions at the perlxml mailing list in December 1998. The beauty of this aproach is, that the secure document is still wellformed and valid of the same document type. It could even translated back to UN/EDIFACT to obtain a message with crypted segments. 1.0 I hope that any consens have been found on that road, so the DTDs wont change in further releases. Those versions may focus on using XML::Edifact in real life applications. I can think about an SQL interface, a Cobol interface, a message designer, a DOM/CORBA wrapper, and much more. Once I think to have XML::Edifact complete, I have to think about speed. Perl is a perfect language for prototyping, but profiling and using a low level language like C for hot spots, will be necessary to handle large batches. 6. Legal stuff Programs provided with this copy called XML-Edifact-0.32.tgz can be used, distributed and modified under terms of the GNU General Public License. Files in the ./examples directory are from varios sources and free of claims as far as I know. Files inside the ./un_edifact_d96b directory are based on EDI batch directory and are therefore copyrighted by the United Nations. See un_edifact_d96b/LICENAGR.TXT. Files that are produced during the bootstrap process and placed in XML::Edifact::d96b are based on the original UN/EDIFACT standard and therefore not covered by GPL, but likely copyrighted by the United Nations. The same is with the text tables produced during Bootstrap.PL. Besides the GPLed Edition a Custom Edition, exist if you dislike GPL. Drop me an eMail and ask for price and conditions. You can hire me as a consultant within Europe, if you think that the author of a tool will probately the best one for teaching your programmers. 7. Download I just got a message from PAUSE that I can upload it to : $CPAN/authors/id/K/KR/KRAEHE XML::Edifact requires XML::Parser, so to download and install type: $ perl -MCPAN -e shell cpan> install XML::Parser cpan> install XML::Edifact or ftp directly at: ftp://ftp.cpan.org/pub/perl/CPAN/modules/by-module/XML/XML-Parser-*.tar.gz ftp://ftp.cpan.org/pub/perl/CPAN/modules/by-module/XML/XML-Edifact-*.tar.gz The canon source of the XML::Edifact project is now: http://www.xml-edifact.org/ This site contain varius example files, research papers and a complete UN/EDIFACT batch directory.