a 'mooh' point

clearly an IBM drone

Struck by the Wrath of Roy "Kahn" Schestowitz

As the real work of maintaining OOXML in ISO has begun, I have had some time to ponder over events throughout the last year - starting with the BRM in Geneva in February 2008.

Being in Geneva was really hard work, negotiating all day in a 120-seat plenum while in the evening preparing suggestions in coorporation with other delegates from other countries. It was fun, but hard, nevertheless. I remember sitting on my bed in the hotel room trying to sort out everything while trying to keep up with the debates happening outside our meeting room (a defecto radio silence had been initiated voluntarily by the more prominent bloggers around the world, so no information was being released to the people desperate for the slightest amount of information).

One of the tools I used was to keep track of the sites referring to my blog and one evening as I sat eating Swiss chocolate on my bed in the hotel, I noticed a new referral from Google Groups.

link

link

link

Versioning of OOXML (thank you for all the fish)

One of the most pressing matters we had to deal with in Okinawa was a question raised by quite a few people including members of the national body of Switzerland as well as hAl on the blogs of Alex Brown, Doug Mahugh and yours truly:

How can you tell if a document is generated using the original set of schemas or the new (improved) ones?

The truth is: you can’t.

Well, at least not at the moment. You can get a hint from sniffing at various parts of the document, but there is no definitive way to do it. We all agreed that we had to come up with a solution, and we discussed (at length in session as well as during breaks, dinners and sight-seeing) what to do.

Roughly speaking, there are a few ways we could do it, including

  • Changing the namespace-name of the schemas
  • Expand the conformance attribute to indicate version of OOXML
  • Adding an optional version attribute to the root elements of the documents (WordpressingML, SpreadsheetML and PresentationML) defaulting to the original edition of  ECMA-376.

Version attribute

Let me start with the last option, since it is the easiest one to explain and understand.

ODF has a “version”-attribute in the root element of ODF-documents. It is defined in the urn:oasis:names:tc:opendocument:xmlns:office:1.0-namespace, so when creating e.g. an ODF spreadsheet using OOo 3, you will see the following xml-fragment:

[code:xml]<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
  xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" (...)
  office:version="1.2">
</office:document-content>[/code]

The above would tell you to use version 1.2 of the ODF-spec – currently being drafted by OASIS.

We could do a similar thing with OOXML, that is, having an optional version-attribute with the version number of the applied flavor of OOXML. This approach would have some clear advantages. First and foremost it would allow all the existing applications supporting OOXML to do absolutely nothing to their existing code base to continue to be able to read and process OOXML-files in ECMA-376 1st Ed format. It would also enable them to use any existing schema-validation of content and all existing files in ECMA-376 would still be perfectly valid.

Expanding the conformance attribute

Another thing to do would be to expand the new conformance attribute. At the BRM in Geneva a new conformance attribute was added to the root elements to display to which version of OOXML the document conforms. You will perhaps recognize this XML-fragment
[code:xml]<w:document conformance=”strict”>
</w:document>[/code]
We could also use this attribute and add version information to it. A way to do it would be
[code:xml]<w:document conformance=”transitional-1.0”>
</w:document>[/code]
for the ECMA-376 1st Ed and something else for any subsequent versions.

Fixing or solving?

The problem with the two alternatives mentioned above is that they provide an immediate fix, but they are in no way panaceas for the issue of versioning. In Geneva we split up OOXML into 4 distinct parts and tried the best we could to make sure, that they were “islands” within themselves. So in the original submission’s Part 2 dealing with OPC, there were dependencies to WordPressingML (AFAIK) and these were removed. The result is that you can now refer to ISO/IEC 29500-2 should you in your implementation need a packaging format where OPC suits your needs. The basic idea was exactly this; to provide a way for other standards to be able to “plug in” to OOXML and reuse specific parts of it.

The two fixes described above provide a fix for the problem with versioning of “the document stuff”; text documents, spreadsheets and presentations – but they do nothing for Part 2 and Part 3 (under the assumption that Part 4 will not change). The trouble is - this is not only a theoretical problem. ECMA TC46 working with XPS (Xml Paper Specification) has based the package format for XPS on OPC. But it is difficult for them to refer to ISO/IEC 29500-2 OPC since it is not possible to distinguish the namespace name from its predecessor ECMA-376 1st Ed. So unless we figure out a solution, they will have to refer to ECMA-376 1st Ed (and it was my impression that they’d prefer to refer to ISO OPC instead).

This is kind of annoying or maybe even embarrassing. We (the ISO process) chose to split up OOXML to allow reuse – but the first time someone knocks on our door and wishes to do exactly that – we (unless we find a solution to this problem) will have to say: “Well, we didn’t actually mean it”.

Change the namespace-name

An entirely different approach would be to change the namespace name(s) of IS29500. The original names where along the lines of

http://schemas.openxmlformats.org/package/2006/content-types
http://schemas.openxmlformats.org/package/2006/relationships
http://schemas.openxmlformats.org/spreadsheetml/2006/main
(…)

So an alternative solution would be to change the values of the namespace name. The names above could be changed to

http://schemas.openxmlformats.org/package/IS29500-2008/content-types
http://schemas.openxmlformats.org/package/ IS29500-2008/relationships
http://schemas.openxmlformats.org/spreadsheetml/IS29500-2008/main

(I would have liked to use colon as seperator between the ISO project number and year, but according to http://www.w3.org/TR/REC-xml/#sec-common-syn, it seems colons are not allowed in namespace names.)

What would be the consequence of this?

The up-side

Basically, changing the namespace name would solve the problem with distinguishing between ECMA-376 1st Ed and IS29500:2008. It would be trivial to distinguish content based on either standard and it would apply to all parts of the specification. Actually, it would apply to all schemas in the specification, so it would enable someone to create a document based on ECMA-376 OPC, IS29500 WordpressingML and ECMA-376 DrawingML (even though this is permitted in the current version of OOXML). It would also give us the chance to have a fresh start with IS29500:2008 and give us a clean slate for our further work.

The down-side

Changing the namespace is sadly not a silver bullet – unfortunately the free lunch comes with nausea as well. The trouble is – by changing the namespace, applications that support ECMA-376 will break if they try to load documents based on IS29500 since the namespace will be foreign to them.

The question is, though: shouldn’t they?

The purpose of XML namespaces are to identify the vocalulary of the elements of an XML-fragment. So the real question could be: are we talking about a new vocabulary when going from ECMA-376 to IS29500:2008? Are the changes from the BRM so drastic that we wouldn’t expect applications supporting ECMA-376 to be able to load documents conforming to IS29500?

Well, it was of importance to ECMA and most of the delegates at the BRM to ensure that whatever we did to change the specification did not render existing nonconformant. We succeeded quite well in doing just this,  so one could argue that the changes were not that big. However, this just concerns the transitional schemas. If you remember, the changes in schema structure were quite big. We divided one big chunk of schemas into two categories, “strict” and “transitional” and I would indeed argue that we changed the vocabulary by doing just that. We changed it from defining a vocabulary with a complete mess of legacy-stuff and new stuff into two separate piles with one “going-forward-vocabulary” and one “going-backwards-vocabulary”. Isn’t that big enough to change the namespace name?

Do it right the first time

At the WG4-meeting I was actually advocating for a simple addition of a version attribute and solve the bigger namespace problem at a later time for a revision of OOXML, but the more I think about it, the more I am convinced this is the wrong way. We are in a position right now where there are no applications out there supporting the full set of IS29500. Not changing the namespace name will not make the problem go away – it will just postpone the issue, and if we wait, the problem will become increasingly bigger as applications will surface with support for IS29500. The problem will be even bigger if you have a long list of supporting applications and not – as now – none a single one.

The more I think about it, the more I am sure the right way to do it is

  1. Add a new version attribute to the root elements defaulting to “1.0” which would be ECMA-376 1st Ed. IS29500:2008 would have version “1.1”.
  2. Change the namespace name for IS29500 in a matter as outlined above.

Vendors in the process of implementing IS29500 will then have to add some code to their application to support this.

But – I am in no way sure I have covered all angles. Am I missing something here?

Smile

Post WG4-meetings in Okinawa

 

Last week (week 4 of 2009) we had the first face-2-face meeting in SC34/WG4 on the Japanese island of Okinawa. Since there is quite a big overlap between the participants of WG4 and those of WG5, the two groups meet at the same time and place to minimize travel costs and time away.

Quite a lot of people had chosen to take the "small" trip to Okinawa, and at roll-call the first day, a total of 22 people sat around the table in the meeting room. Of these were 6 from ECMA and 14 represented various national bodies (of these were 3 employed by Microsoft)

How's that for full disclosure, eh?

The purpose of the meeting was to get started maintaining OOXML and to discuss what to do in the future. We were also to discuss the already submitted DRs and see what we could do about these.

One of the first things I realized on that morning was, that by participating in standardization in ISO (and from what I hear, also most other standardisation organisations) you need to accept following a certain number of rules. As it turns out, we are in no way free to fix problems in the spec, we are in no way free to make new additions of the spec etc. As it turns out, there are rules constraining all of these activities. So the project editor (Rex Jaeschke) took us on a lengthy trip down "ISO-regulation-lane". The idea was to give us all some knowledge of the rules and terms (as in 'nouns') used in the directives so that we would all be on the same, first page moving forward. The basis for the walk-through was a document prepared by the editor and it is available on WG4's website.

DRs

Quite a lot of DRs were submitted to WG4 before the meeting. I think the total number was about 25-30, and they ranged from fixing spelling errors to clarification of the text and schema changes. The first thing we discussed was how to categorize the DRs. The "buckets" were "defects" and "amendments" and how to distinguish between editorial defects and technical defects. We quickly agreed that focus should initially be to verify and aprove any DRs relating to decisions from Geneva that had not made it into the final text. ECMA also had quite a big batch of DRs submitted before the meetings, but since they were not submitted in time for everyone to look at them, we did not make any decisions about these - ECMA just went through them in detail and we discussed each of them.

Details we discussed were certainly of world-changing importance, such as the difference between the text fragments "nearest thousands of bytes" and "nearest thousand bytes", the allowed content of string-literals and intricate details of the xml:space-attribute in an XML-element based on the XML 1.0 specification. Still, it was quite entertaining and it was delightful to sit back and simply overhear the discussions of people that really know what they were talking about.

Comment collection form

ECMA has set up a comment collection form to submit DRs from interested national bodies. It has already been set to use by the Japanese national body and it seems to serve its purpose just fine. Hopefully it will enable us to improve data qualityof the incoming DRs. We gave feedback to the application to Doug Mahugh from ECMA and hopefully he will see to that the suggestions are implemented (especially mine!)

Smile

We discussed at length the concept of "openness" and how we should apply it to our work, and I will cover my feelings for this in detail in a top-post a bit later.

Last minute impressions

This was my second trip to Japan and I must say that I am getting more and more excited about it for every trip. The culture is fantastic and it is a good challenge to be in a part of the world, where you don't speak the language and is incapable of reading almost any signs. I did get a bit of "Lost in Translation"-feeling on my trip back (+40 hrs!), but it was really a good trip. Two thumbs up for the convener, Murata-san who showed us how a splendid host acts and shows their guests a great time.

All in all I also think we had some productive days on Okinawa. We managed to deal with quite a few DRs and to set up work-processes for the future and I am sure we will benefit in the near future of the work we did. It was also interesting to watch the "arm-wrestling" between the national bodies and ECMA. We were on the same page in most cases, but it was interesting to be part of the discussions where we were not. It will be interesting to see how this will evolve in the future. ISO is a bit different than, say, OASIS because of the involvement of national bodies. Where the basis for most of the groups in OASIS is "vendors", it is quite orthogonal to this in ISO where this concept does not really exist. Some of you may remember Martin Bryan's angry words at the plenary in Kyoto about vendor participation and "positions" vs. "opinions" and I am looking forward to take part in these discussions in WG4 as well as here.

 


Additional resources

Below are a couple of links that might be of interest to you

SC34 WG4 public website

SC34 website

(and for Okinawa-related activities)

Alex Brown's write-up about day 0, 1, 2 and 3-4 of the meetings

Doug Mahugh's summary of what took place

Pictures taken by the secretariat

Picture-stream from Doug Mahugh

Picture stream from Alex Brown

Picture stream from Jesper Lund Stocholm (me!)

Twitter stream from Doug Mahugh

Twitter stream from Alex Brown (notice the l33t-speek Twitter-tag Alex uses!)

Twitter stream from Jesper Lund Stocholm

Bonus for those of you waiting for the credits at the end of the movie:

The day I arrived I was met by Murata-san and Alex Brown in the lobby of the hotel. They were on their way to dinner at a restaurant called "Kalahaai" in the "American Village" of Naha. The dinner took place in a restaurant with live Japanese music from a group called "Tink Tink". Their music was really amazing. The last evening we went there again, and Shawn and I were listening completely baffled to the music and on-stage talks of the performers. It was an amazing experiance to sit in the restaurant not understanding a single word they said - and still not being able to stop listening to them.



(courtesy of Doug Mahugh)

And look at this picture. Thanks to Doug's tele/wide/fish-eye-whatever-lense on his camera, I look like an absolutely mad-/maniac man! No girls were hurt during this, I should point out.


(courtesy of Doug Mahugh)

Smile