a 'mooh' point

clearly an IBM drone

Is "interoperability" a transitive characteristic?

Way back when I was a math-major at university, we were taught about "operations on sets". A set could simply be "the natural numbers", which could be defined as all positive integers including the number 0. An operation on this set could be addition of numbers, multiplication of numbers and so forth. An operation can have a lot of characteristics, e.g "commutative", "associative" or "transitive". An "associative" operator means that you can group the operands any way you want and a "commutative" operator means that you can change the order of the operands. Confused? Well, it's not that complex when you think of it. The mathematical operator "addition" is an "associative" operator (or "relation") since (1+2) + 3 = 6 and 1 + (2+3) = 6. The operator "divide" is not associative since (1/2) / 3 = 1/6 whereas 1 / (2/3) = 3/2. Addition is also a commutative property since you can change the order of the numbers being added together. This is evident since 1+2+3 = 6 and 3+2+1 = 6. Similarly "subtraction" is not a commutative operator since 1-2-3 = -4 whereas 3-2-1 = 0.

The transitive characteristic is a bit different than this and the "everyday equivilant" would be when we infer something. So think of transitivity is a mathematical formulation of what we do when we infer.

The relation "is greater than" is a transitive characteristic - as well as "is equal to". Basically, a relation (is greater than) being transitive means, that if A > B and B > C then A > C.

The latter popped into my mind the other day when I was pondering over interoperability between implementations of document formats.

Ever since Rob's ingenious article "Update on OpenOffice.org Calc ODF interoperability", I haven't been able to get it out of my head.


1  /  2  / 3   

Extending OOXML

This article will have to topics - one about extending OOXML using the built-in extension mechanisms and one about extending OOXML itself.

Using built-in mechanisms

As I have written about earlier OOXML has a (fun) part containing mechanisms for extending OOXML with vendor/domain-specific extensions. That part is "Part 3 - Markup Compatibility and Extensibility". The part describes different techniques when extending OOXML - most interesting is propably the sections about "Markup Compatibility Attributes and Elements" describing ways to extend OOXML while enabling compatibility to e.g. earlier/current version of the specification.

So if you were a vendor wanting to add something to the spec - but couldn't wait for the slow ISO pace or simply needed the competitive edge of not revealing anything about future software releases to your competitors ... what could you do?

The first thing you should do is to decide if you want your new stuff to eventually make it into the spec. If you don't want that - you're done already.

Assuming you want it into the spec, here are a couple of hints to how you might approach it:

  1. Document your extensions thoroughly
  2. Present these extensions to SC34/WG4 with justification to how and why you want it into the spec
  3. Work with us to polish the nitty-gritty details that you overlooked
  4. Make sure there are no legal nor technical barriers to implementing these new features for your competitors
  5. Wait for the stuff to eventually be included in IS29500

So the real target of this is - if you haven't already guessed it - Microsoft. So to be even more specific, here's a little list of things to do for Microsoft - in case they want to extend IS29500:

You will propably have some additions to IS29500 in your implementation of Office 14. Assuming that you will at some point like these to be added to IS29500, this is what you should do:

  1. Document your extensions thoroughly. Remember, the quality of the documentation will be under the same scrutiny as the text of DIS29500 so please do it right the first time.
  2. Add the documentation of your extensions to your "Implementer's notes" on the DII-website. 
  3. Present these extensions to SC34/WG4 with justification to how and why you want it into the spec.
  4. Work with us to polish the nitty-gritty details that you overlooked.
  5. Include the extensions and the documentation for it in your OSP.
  6. Wait for the stuff to eventually be included in IS29500.

Remember, the minute the first public beta of Office 14 hits the web, the documentation of the extensions as well as inclusion in OSP should be finished. Not a month later, not a week later - on day one!

Extending IS29500 itself

There has been a lot of talk lately to how IS29500 will be extended in the future. Specifically, how - and where - will new additions be included? IS29500 is comprised of two schema sets - a strict set and a transitional set. Currently the strict set is created from the transitional set, so strict is in fact a proper subset of the transitional set.

However - there is no guarentee that this will always be so.

My gut feeling is that transitional should be preserved as the "reflection" of the existing Microsoft Office documents (until March 2008) - in other words in term with the scope of IS29500.  I think that any new stuff should be added to the strict schema set only. The term "transitional" clearly implies this. As I recall the feeling in Geneva at the BRM, the idea behind the transitional set was, that eventually it would no longer be needed and hence removed from the standard - at some point in the future. If we continue to add new features to the transitional set, we will never get to the point where we can honor the sentiment of this particular issue.

...  now at the moment, we haven't decided anything yet ... so right now anything goes.

But what are your thoughts?

IBM: Thumbs up for OOXML!

Today news broke that ANSI (the US national standardisation guys) recently voted on the subject of approving OOXML as an "American National Standard".

The text of the ballot was:

Approval to Adopt the International Standards listed below as American National Standards:

  • ISO/IEC 29500-1:2008 (...) Part 1: Fundamentals and Markup Language Reference
  • ISO/IEC 29500-2:2008 (...) Part 2: Open Packaging Conventions
  • ISO/IEC 29500-3:2008 (...) Part 3: Markup Compatibility and Extensibility
  • ISO/IEC 29500-4:2008 (...) Part 4: Transitional Migration Features

A total of 20 organisations/entities voted and the result was

  • Approve: 12
  • No: 0
  • Abstain: 2
  • Not voted: 2

The details are here:

DateOrganizationYesNoAbstainNot Yet
TOTAL 12 0 2 4
03/16/2009 Adobe Systems       X
04/13/2009 Apple Inc X      
04/15/2009 Department of Homeland Security X      
03/16/2009 DMTF       X
04/09/2009 Electronic Industries Alliance X      
03/16/2009 EMC X      
03/16/2009 Farance, Incorporated       X
03/16/2009 Google       X
04/15/2009 GS1 US     X   Comments
04/13/2009 Hewlett Packard Co X      
03/24/2009 IBM Corp X      
04/15/2009 IEEE     X   Comments
04/08/2009 Intel X      
03/18/2009 Lexmark International X      
03/17/2009 Microsoft X      
03/16/2009 NIST X      
03/19/2009 Oracle X      
03/16/2009 US Department of Defense X      

An interesting vote here is naturally the vote of "International Business Machines Corp", otherwise known as IBM. It seems they now support OOXML - good for them.

I think it is an extremely positive move from IBM and I salute them for finally getting their act together and supporting OOXML. I also hope IBM will tread in the footsteps of Microsoft in terms of TC-participation and join us in SC34/WG4 to contribute to the work we do. I think it is positive for the industry that Microsoft finally joined OASIS ODF TC last summer, and I hope IBM will do the same with SC34/WG4 - we need other vendors besides Microsoft at the table. I also hope this means that IBM will speed up support for OOXML in either Lotus Symphony or OpenOffice.org. The support for OOXML in other applications than Microsoft Office 2007 is ridiculously low.

Thank you, IBM - you really made my day.


PS: I appologize for the colors of the table above


The actual work we did in Prague

I thought I’d try to outline a bit what we actually did and what constituted our work in Prague.

The agenda framing our work throughout these three days was this:

  1. Opening 2009-03-24 09:00
  2. Roll call of delegates
  3. Adoption of the agenda
  4. Schedule for publication of reprints or Technical Corrigenda
  5. Defect reports
  6. Future meeting (face-to-face and teleconferences)
  7. Any other business
  8. Extension proposals from member bodies and liaisons
  9. Conformance testing
  10. Closing

The vast majority of our work was in item number 5 on the agenda and each and every single minute was used discussing the defect reports – including in lavatories, on our way to work, on our way back from work, during lunch, dinner, breaks and drinks … in short – we discussed DRs 24/7. This was as it was supposed to be – this was really the reason for all of us being in Prague.

The initial list of DRs we discussed was this (just to give you an idea of what we talked about):


I think it’d be fair to say that we have come a long way since the time we were discussing if it was possible to use XSLT to simulate bit-switching or if an OOXML-file was “proper XML”.

For each of the DRs we covered we discussed if the DR was a technical defect or an editorial defect, what the possible implications of the DR would be to existing documents and existing implementations and if the DR belonged in a corrigendum (COR) or if it was an amendment (AMD). It was quite tedious work, but we managed to cover quite a lot of ground in the three days.

Corrigendum or amendment?

One of the first things to accept when working in ISO is that there are quite the number of rules to comply to. As it turns out, it is not our prerogative to decide if a DR goes into “the COR bucket” or if it goes into “the AMD bucket” – there are rules for this. The ISO directives section 2.10.2 state that

A technical corrigendum is issued to correct [...] a technical error or ambiguity in an International Standard, a Technical Specification, a Publicly Available Specification or a Technical Report, inadvertently introduced either in drafting or in printing and which could lead to incorrect or unsafe application of the publication

If the above is not the case, the modification should be handled as an amendment.

Still, there are quite a lot of DRs that fall into the more gray outskirts of this definition. So to facilitate our work we made some guiding principles, and these principles were discussed at the SC34 plenary in Prague:

[…] in the interest of resolving minor omissions in a timely fashion, WG4 plans to apply the following criteria for deciding that the unintentional omission or restriction of a feature may be resolved by Corrigendum rather than by Amendment. All of the following criteria should be met for the defect to be resolved by Corrigendum:

  1. WG 4 agrees that the defect is an unintentional drafting error.
  2. WG 4 agrees that the defect can be resolved without the theoretical possibility of breaking existing conformant implementations of the standard.
  3. WG 4 agrees that the defect can be resolved without introducing any significant new feature.

Unless all the above criteria are met, the defect should be resolved by Amendment.

Of course we will still have to do an assessment for each and every DR we look at, but it is our view that these principles will help us quite a bit along the way and to have a more expeditious workflow. Notice also the wording “WG4 agrees”. A very small number of DRs falls clearly into the COR- or AMD-bucket, so it is not possible to regard these principles as a mere algorithm with a deterministic result. The principles requires WG4 to agree to the categorization of DRs so we’ll actually have to sit down and talk everything through.

On the first day (or was it second?) we also touched briefly upon the subject of modifying decisions made at the BRM. The delegates at the BRM were nothing but normal people, and due to the short timeframe of the meeting, errors likely occurred. At some point or another, someone will discover we made a mistake and put a DR on our table. At this point we will have to figure out if we think the decisions made at the BRM are now cast in stone or if they should be treated by the same criteria as the other DR we receive. As I said, we just touched upon the subject and didn’t reach any conclusions to this. If you have any thoughts regarding this, please let me (and us) know. My personal opinion on this subject is, that we in WG4, at this point in time, should be extremely careful when thinking about reversing decisions made at the BRM.

And finally, I thought I’d give you some pointers about what is in the pipeline of blog entries (I don’t have a sophisticated system as some people, so I’d just give you a small list of topics at the top of my mind these days:

  • Markup Compatibility and Extensibility
  • Conformance class whatnots
  • Namespace changes and the considerations about doing it or not
  • Why should we care about XPS?
  • Why I like the ISO model
  • Maintenance of IS26300 in ISO


WG4 meetings in Prague

Wow – this has been a tough week. I arrived at the hotel here in Prague (I am currently waiting in Prague Airport for my flight back to Copenhagen) at around 21:00. I met Doug in Copenhagen and flew with him to Prague and in the airport we ran into Kimmo. After 15 minutes in my hotel room I went down to the bar to get a “welcome to Prague”-beer. After another 15 minutes I crawled back to my room completely devastated due to a flu I hadn’t been able to get rid of. 5 seconds later Florian called and ordered me to get my ass down in the basement wine-bar where he was having drinks with Doug and Megan. I went back to my room when the bar closed at around half past midnight, did some last-minute updates/tweets and almost cried myself to sleep because of near-death-like fatigue.

… and the meetings hadn’t actually started yet.

The next morning the meetings started with a joint session between WG4 and WG5 at the Czech Standardisation Institute. A total of 31 delegates attended this initial meeting. Apart from the SC34 officers (SC34 chair, SC34 secretariat, WG4 convener), there were delegates from Canada, China, Czech Republic, Denmark, ECMA, Finland, France, Germany, Korea, Norway, South Africa, UK and USA. We had quite a lot of work on our table for these three days, and we immediately got to work after the initial pleasantries. A rough list of categories to be dealt with was “Defect reports”, “Rules of engagement” (or “Prime directives”), “Future work”, “Roadmap for future editions/corrections” and “Planning of future meetings and tele-conferences”.

If you’ve been following my twitter-feed (and the ones of Alex, Doug an Inigo) you’ll already have a notion of the insanely interesting things we talked about. But for those not following me (and you should!!!) we talked about sexy things like whether “named ranges” in spreadsheets were defined on the workbook-level or the worksheet-level, whether a reference to Unicode 5 implied dependencies of XML 1.1, whether xml:space applied to whitespace-only-nodes or just to trailing- and leading whitespace in element content, whether font-substitution algorithms in OOXML had a bias for Panose-fonts and if “Panose” really meant “Panose1” and suttle differences between the Panose-edition of Hewlett-Packard and the one of Microsoft (as far as I understood it, anyway)

Can you imagine all the fun we had?

And you know what? We didn’t stop talking about it during lunch, dinner nor brakes. As Doug noted in one of his tweets, the only difference between session and breaks was that during session, only one person talked at any given time.

Well, apart from all this fun, we made an enormous amount of progress. A total of about 169 defect reports have been submitted to us until this point, and we processed almost all of them. We didn’t close all of them, but we managed to process the most important ones and prepare ourselves for our first tele conference in mid April. We laid down some ground principles upon which we will make decisions in the future and we talked about a set of “Prime directives” to form a mental basis for our work (think: The three Laws of Robotics).

In short – it was a good week. I’ll post a series of blog posts in the next weeks outlining the results we achieved (and did not achieve) including both the extremely boring ones as well as the more controversial ones. So Watch this space …

PS: I almost forgot. Microsoft sponsored a dinner/buffet for the participating experts on Wednesday. But what was even cooler was that they had lined up a bunch of Ferraris and Lamborghinis for us outside the restaurant, and we could just take a pick to choose a car to take home. Mine was red! Is that wicked or what?

To the nitwits from <no>ooxml.org: Take it home, boys!

Struck by the Wrath of Roy "Kahn" Schestowitz

As the real work of maintaining OOXML in ISO has begun, I have had some time to ponder over events throughout the last year - starting with the BRM in Geneva in February 2008.

Being in Geneva was really hard work, negotiating all day in a 120-seat plenum while in the evening preparing suggestions in coorporation with other delegates from other countries. It was fun, but hard, nevertheless. I remember sitting on my bed in the hotel room trying to sort out everything while trying to keep up with the debates happening outside our meeting room (a defecto radio silence had been initiated voluntarily by the more prominent bloggers around the world, so no information was being released to the people desperate for the slightest amount of information).

One of the tools I used was to keep track of the sites referring to my blog and one evening as I sat eating Swiss chocolate on my bed in the hotel, I noticed a new referral from Google Groups.




Versioning of OOXML (thank you for all the fish)

One of the most pressing matters we had to deal with in Okinawa was a question raised by quite a few people including members of the national body of Switzerland as well as hAl on the blogs of Alex Brown, Doug Mahugh and yours truly:

How can you tell if a document is generated using the original set of schemas or the new (improved) ones?

The truth is: you can’t.

Well, at least not at the moment. You can get a hint from sniffing at various parts of the document, but there is no definitive way to do it. We all agreed that we had to come up with a solution, and we discussed (at length in session as well as during breaks, dinners and sight-seeing) what to do.

Roughly speaking, there are a few ways we could do it, including

  • Changing the namespace-name of the schemas
  • Expand the conformance attribute to indicate version of OOXML
  • Adding an optional version attribute to the root elements of the documents (WordpressingML, SpreadsheetML and PresentationML) defaulting to the original edition of  ECMA-376.

Version attribute

Let me start with the last option, since it is the easiest one to explain and understand.

ODF has a “version”-attribute in the root element of ODF-documents. It is defined in the urn:oasis:names:tc:opendocument:xmlns:office:1.0-namespace, so when creating e.g. an ODF spreadsheet using OOo 3, you will see the following xml-fragment:

[code:xml]<?xml version="1.0" encoding="UTF-8"?>
  xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" (...)

The above would tell you to use version 1.2 of the ODF-spec – currently being drafted by OASIS.

We could do a similar thing with OOXML, that is, having an optional version-attribute with the version number of the applied flavor of OOXML. This approach would have some clear advantages. First and foremost it would allow all the existing applications supporting OOXML to do absolutely nothing to their existing code base to continue to be able to read and process OOXML-files in ECMA-376 1st Ed format. It would also enable them to use any existing schema-validation of content and all existing files in ECMA-376 would still be perfectly valid.

Expanding the conformance attribute

Another thing to do would be to expand the new conformance attribute. At the BRM in Geneva a new conformance attribute was added to the root elements to display to which version of OOXML the document conforms. You will perhaps recognize this XML-fragment
[code:xml]<w:document conformance=”strict”>
We could also use this attribute and add version information to it. A way to do it would be
[code:xml]<w:document conformance=”transitional-1.0”>
for the ECMA-376 1st Ed and something else for any subsequent versions.

Fixing or solving?

The problem with the two alternatives mentioned above is that they provide an immediate fix, but they are in no way panaceas for the issue of versioning. In Geneva we split up OOXML into 4 distinct parts and tried the best we could to make sure, that they were “islands” within themselves. So in the original submission’s Part 2 dealing with OPC, there were dependencies to WordPressingML (AFAIK) and these were removed. The result is that you can now refer to ISO/IEC 29500-2 should you in your implementation need a packaging format where OPC suits your needs. The basic idea was exactly this; to provide a way for other standards to be able to “plug in” to OOXML and reuse specific parts of it.

The two fixes described above provide a fix for the problem with versioning of “the document stuff”; text documents, spreadsheets and presentations – but they do nothing for Part 2 and Part 3 (under the assumption that Part 4 will not change). The trouble is - this is not only a theoretical problem. ECMA TC46 working with XPS (Xml Paper Specification) has based the package format for XPS on OPC. But it is difficult for them to refer to ISO/IEC 29500-2 OPC since it is not possible to distinguish the namespace name from its predecessor ECMA-376 1st Ed. So unless we figure out a solution, they will have to refer to ECMA-376 1st Ed (and it was my impression that they’d prefer to refer to ISO OPC instead).

This is kind of annoying or maybe even embarrassing. We (the ISO process) chose to split up OOXML to allow reuse – but the first time someone knocks on our door and wishes to do exactly that – we (unless we find a solution to this problem) will have to say: “Well, we didn’t actually mean it”.

Change the namespace-name

An entirely different approach would be to change the namespace name(s) of IS29500. The original names where along the lines of


So an alternative solution would be to change the values of the namespace name. The names above could be changed to

http://schemas.openxmlformats.org/package/ IS29500-2008/relationships

(I would have liked to use colon as seperator between the ISO project number and year, but according to http://www.w3.org/TR/REC-xml/#sec-common-syn, it seems colons are not allowed in namespace names.)

What would be the consequence of this?

The up-side

Basically, changing the namespace name would solve the problem with distinguishing between ECMA-376 1st Ed and IS29500:2008. It would be trivial to distinguish content based on either standard and it would apply to all parts of the specification. Actually, it would apply to all schemas in the specification, so it would enable someone to create a document based on ECMA-376 OPC, IS29500 WordpressingML and ECMA-376 DrawingML (even though this is permitted in the current version of OOXML). It would also give us the chance to have a fresh start with IS29500:2008 and give us a clean slate for our further work.

The down-side

Changing the namespace is sadly not a silver bullet – unfortunately the free lunch comes with nausea as well. The trouble is – by changing the namespace, applications that support ECMA-376 will break if they try to load documents based on IS29500 since the namespace will be foreign to them.

The question is, though: shouldn’t they?

The purpose of XML namespaces are to identify the vocalulary of the elements of an XML-fragment. So the real question could be: are we talking about a new vocabulary when going from ECMA-376 to IS29500:2008? Are the changes from the BRM so drastic that we wouldn’t expect applications supporting ECMA-376 to be able to load documents conforming to IS29500?

Well, it was of importance to ECMA and most of the delegates at the BRM to ensure that whatever we did to change the specification did not render existing nonconformant. We succeeded quite well in doing just this,  so one could argue that the changes were not that big. However, this just concerns the transitional schemas. If you remember, the changes in schema structure were quite big. We divided one big chunk of schemas into two categories, “strict” and “transitional” and I would indeed argue that we changed the vocabulary by doing just that. We changed it from defining a vocabulary with a complete mess of legacy-stuff and new stuff into two separate piles with one “going-forward-vocabulary” and one “going-backwards-vocabulary”. Isn’t that big enough to change the namespace name?

Do it right the first time

At the WG4-meeting I was actually advocating for a simple addition of a version attribute and solve the bigger namespace problem at a later time for a revision of OOXML, but the more I think about it, the more I am convinced this is the wrong way. We are in a position right now where there are no applications out there supporting the full set of IS29500. Not changing the namespace name will not make the problem go away – it will just postpone the issue, and if we wait, the problem will become increasingly bigger as applications will surface with support for IS29500. The problem will be even bigger if you have a long list of supporting applications and not – as now – none a single one.

The more I think about it, the more I am sure the right way to do it is

  1. Add a new version attribute to the root elements defaulting to “1.0” which would be ECMA-376 1st Ed. IS29500:2008 would have version “1.1”.
  2. Change the namespace name for IS29500 in a matter as outlined above.

Vendors in the process of implementing IS29500 will then have to add some code to their application to support this.

But – I am in no way sure I have covered all angles. Am I missing something here?


Post WG4-meetings in Okinawa


Last week (week 4 of 2009) we had the first face-2-face meeting in SC34/WG4 on the Japanese island of Okinawa. Since there is quite a big overlap between the participants of WG4 and those of WG5, the two groups meet at the same time and place to minimize travel costs and time away.

Quite a lot of people had chosen to take the "small" trip to Okinawa, and at roll-call the first day, a total of 22 people sat around the table in the meeting room. Of these were 6 from ECMA and 14 represented various national bodies (of these were 3 employed by Microsoft)

How's that for full disclosure, eh?

The purpose of the meeting was to get started maintaining OOXML and to discuss what to do in the future. We were also to discuss the already submitted DRs and see what we could do about these.

One of the first things I realized on that morning was, that by participating in standardization in ISO (and from what I hear, also most other standardisation organisations) you need to accept following a certain number of rules. As it turns out, we are in no way free to fix problems in the spec, we are in no way free to make new additions of the spec etc. As it turns out, there are rules constraining all of these activities. So the project editor (Rex Jaeschke) took us on a lengthy trip down "ISO-regulation-lane". The idea was to give us all some knowledge of the rules and terms (as in 'nouns') used in the directives so that we would all be on the same, first page moving forward. The basis for the walk-through was a document prepared by the editor and it is available on WG4's website.


Quite a lot of DRs were submitted to WG4 before the meeting. I think the total number was about 25-30, and they ranged from fixing spelling errors to clarification of the text and schema changes. The first thing we discussed was how to categorize the DRs. The "buckets" were "defects" and "amendments" and how to distinguish between editorial defects and technical defects. We quickly agreed that focus should initially be to verify and aprove any DRs relating to decisions from Geneva that had not made it into the final text. ECMA also had quite a big batch of DRs submitted before the meetings, but since they were not submitted in time for everyone to look at them, we did not make any decisions about these - ECMA just went through them in detail and we discussed each of them.

Details we discussed were certainly of world-changing importance, such as the difference between the text fragments "nearest thousands of bytes" and "nearest thousand bytes", the allowed content of string-literals and intricate details of the xml:space-attribute in an XML-element based on the XML 1.0 specification. Still, it was quite entertaining and it was delightful to sit back and simply overhear the discussions of people that really know what they were talking about.

Comment collection form

ECMA has set up a comment collection form to submit DRs from interested national bodies. It has already been set to use by the Japanese national body and it seems to serve its purpose just fine. Hopefully it will enable us to improve data qualityof the incoming DRs. We gave feedback to the application to Doug Mahugh from ECMA and hopefully he will see to that the suggestions are implemented (especially mine!)


We discussed at length the concept of "openness" and how we should apply it to our work, and I will cover my feelings for this in detail in a top-post a bit later.

Last minute impressions

This was my second trip to Japan and I must say that I am getting more and more excited about it for every trip. The culture is fantastic and it is a good challenge to be in a part of the world, where you don't speak the language and is incapable of reading almost any signs. I did get a bit of "Lost in Translation"-feeling on my trip back (+40 hrs!), but it was really a good trip. Two thumbs up for the convener, Murata-san who showed us how a splendid host acts and shows their guests a great time.

All in all I also think we had some productive days on Okinawa. We managed to deal with quite a few DRs and to set up work-processes for the future and I am sure we will benefit in the near future of the work we did. It was also interesting to watch the "arm-wrestling" between the national bodies and ECMA. We were on the same page in most cases, but it was interesting to be part of the discussions where we were not. It will be interesting to see how this will evolve in the future. ISO is a bit different than, say, OASIS because of the involvement of national bodies. Where the basis for most of the groups in OASIS is "vendors", it is quite orthogonal to this in ISO where this concept does not really exist. Some of you may remember Martin Bryan's angry words at the plenary in Kyoto about vendor participation and "positions" vs. "opinions" and I am looking forward to take part in these discussions in WG4 as well as here.


Additional resources

Below are a couple of links that might be of interest to you

SC34 WG4 public website

SC34 website

(and for Okinawa-related activities)

Alex Brown's write-up about day 0, 1, 2 and 3-4 of the meetings

Doug Mahugh's summary of what took place

Pictures taken by the secretariat

Picture-stream from Doug Mahugh

Picture stream from Alex Brown

Picture stream from Jesper Lund Stocholm (me!)

Twitter stream from Doug Mahugh

Twitter stream from Alex Brown (notice the l33t-speek Twitter-tag Alex uses!)

Twitter stream from Jesper Lund Stocholm

Bonus for those of you waiting for the credits at the end of the movie:

The day I arrived I was met by Murata-san and Alex Brown in the lobby of the hotel. They were on their way to dinner at a restaurant called "Kalahaai" in the "American Village" of Naha. The dinner took place in a restaurant with live Japanese music from a group called "Tink Tink". Their music was really amazing. The last evening we went there again, and Shawn and I were listening completely baffled to the music and on-stage talks of the performers. It was an amazing experiance to sit in the restaurant not understanding a single word they said - and still not being able to stop listening to them.

(courtesy of Doug Mahugh)

And look at this picture. Thanks to Doug's tele/wide/fish-eye-whatever-lense on his camera, I look like an absolutely mad-/maniac man! No girls were hurt during this, I should point out.

(courtesy of Doug Mahugh)


The complexity of SpreadsheetML - oh the sheer joy of it!

Having a bit of time on my hands while attending the SC34/WG4-meeting in Okinawa, I thought I'd write up a blog post I have wanted to write in quite some time.

The reason for me doing this was a requirement I am often presented by CIBER's customers - export my data to Excel. The data they want us to export are traditionally grouped into three categories:

  • Text (strings)
  • Numbers
  • Dates

Creating cells with numbers and text are really a no-brainer in OOXML. It is a bit more complicated when it comes to dates, because dates in e.g. ISO 8601-format are not as such supported as "built-in cell data types" in SpreadsheetML. Instead, dates are presented by styling content in number-cells. This means that to be able to display a date in SpreadsheetML, you need to be know "a bit" about styling in spreadsheets.

Now, as some of you remember, representation of dates in spreadsheets using OOXML is done in "serial form" meaning that dates are stored as numbers. These numbers are also known as "Julian days" - not to be mistaken with the "Julian Calendar". In even other words a date is represented as the number of days since some starting point in time.

So if I wanted to store the date "December 20nd 2009" in OOXML, I would have to convert it to a "julian representation" - in this case "40167". This is really just a minor annoyance - the conversion is trivial and a no-brainer. However - the fun has not started yet.

If you look at the markup required, it would have to be like this:

  <row r="1">
    <c r="A1">

So this will give me a cell with a serial representation of 2009-12-22. However, if I open this in an OOXML-compliant application, it will display "40167". As I mentioned above, it turns out that displaying the serial representation as a "proper date" requires styling of the cell content.

The key is an attribute on the <c>-element I omitted in the example above.

  <row r="1">
    <c r="A1" s="0">

The "s"-attribute specified the style for the given cell. The specefication says this for this particular attribute:

The index of this cell's style. Style records are stored in the Styles Part.

Ok - cool so the good thing here is, that we now know what the attribute is used for. The bad thing is that we don't know anything about "how".

Styles for SpreadsheetML are described in section 3.8. The complete section is about 110 pages and it describes at length each element name and attribute but again it more answers "what" than "how".

(I just talked to another delegate about if a standard should describe both the hows and the whats, and it seems that the jury is still out on that one, so these are simply my personal observations of using the specification to solve a concrete problem).

So in figuring out how to do this, a good starting point would be to look at the list of valid child elements. These are defined as

[code:xml]<complexType name="CT_Stylesheet">
    <element name="numFmts" type="CT_NumFmts" minOccurs="0" maxOccurs="1"/>
    <element name="fonts" type="CT_Fonts" minOccurs="0" maxOccurs="1"/>
    <element name="fills" type="CT_Fills" minOccurs="0" maxOccurs="1"/>
    <element name="borders" type="CT_Borders" minOccurs="0" maxOccurs="1"/>
    <element name="cellStyleXfs" type="CT_CellStyleXfs" minOccurs="0" maxOccurs="1"/>
    <element name="cellXfs" type="CT_CellXfs" minOccurs="0" maxOccurs="1"/>
    <element name="cellStyles" type="CT_CellStyles" minOccurs="0" maxOccurs="1"/>
    <element name="dxfs" type="CT_Dxfs" minOccurs="0" maxOccurs="1"/>
    <element name="tableStyles" type="CT_TableStyles" minOccurs="0" maxOccurs="1"/>
    <element name="colors" type="CT_Colors" minOccurs="0" maxOccurs="1"/>
    <element name="extLst" type="CT_ExtensionList" minOccurs="0" maxOccurs="1"/>

The elements that should (ahem) draw attention to them are "cellStyles", "cellStyleXfs" and "cellXfs".So, if you want to apply formatting directly to a cell, look at e.g. the element <cellXfs> defined in section 3.8.10. It says (in abstract)

This element contains the master formatting records (xf) which define the formatting applied to cells in this workbook. These records are the starting point for determining the formatting for a cell. Cells in the Sheet Part reference the xf records by zero-based index.

The <cellXfs>-element has a child element called <xf>. The element is defined as

[code:xml]<complexType name="CT_Xf">
    <element name="alignment" type="CT_CellAlignment" minOccurs="0" maxOccurs="1"/>
    <element name="protection" type="CT_CellProtection" minOccurs="0" maxOccurs="1"/>
    <element name="extLst" type="CT_ExtensionList" minOccurs="0" maxOccurs="1"/>
  <attribute name="numFmtId" type="ST_NumFmtId" use="optional"/>
  <attribute name="fontId" type="ST_FontId" use="optional"/>
  <attribute name="fillId" type="ST_FillId" use="optional"/>
  <attribute name="borderId" type="ST_BorderId" use="optional"/>
  <attribute name="xfId" type="ST_CellStyleXfId" use="optional"/>
  <attribute name="quotePrefix" type="xsd:boolean" use="optional" default="false"/>
  <attribute name="pivotButton" type="xsd:boolean" use="optional" default="false"/>
  <attribute name="applyNumberFormat" type="xsd:boolean" use="optional"/>
  <attribute name="applyFont" type="xsd:boolean" use="optional"/>
  <attribute name="applyFill" type="xsd:boolean" use="optional"/>
  <attribute name="applyBorder" type="xsd:boolean" use="optional"/>
  <attribute name="applyAlignment" type="xsd:boolean" use="optional"/>
  <attribute name="applyProtection" type="xsd:boolean" use="optional"/>

The attribute you want here is "numFmtId". The attribute is described as "Id of the number format (numFmt) record used for this cell format".

(are we getting there soon?)

Anywho, going to the reference of numFmt will lead you to paragraph 3.8.30 numFmt (Number Format) and it will tell you, that some of the values of the attribute are implied. That's really just another way of saying "reserved values". 

 1  0
 2  0.00
 3  #,##0
 4  #,##0.00
 9  0%
 10  0.00%
 11  0.00E+00
 12  # ?/?
 13  # ??/??
 14  mm-dd-yy
 15  d-mmm-yy
 16  d-mmm
 17  mmm-yy
 18  h:mm AM/PM
 19  h:mm:ss AM/PM
 20  h:mm
 21  h:mm:ss
 22  m/d/yy h:mm
 37  #,##0 ;(#,##0)
 38  #,##0 ;[Red](#,##0)
 39  #,##0.00 ;(#,##0.00)
 40  #,##0.00 ;[Red](#,##0.00
 45  mm:ss
 46  [h]:mm:ss
 47  mmss.0
 48  ##0.0E+0
 49  @

It looks like id 15 could be the one we are looking for. So I'm gonna add this number format to the xf-elements's numFmt-attribute and create this xml-fragment:

[code:xml]<cellXfs count="2">
  <xf numFmtId="15" (...)  />

Behold - it actually works. When I load this in Microsoft Office 2007, it will display this:

So what have I learned here (apart from the astounding complexity of this relatively trivial task)? Well, to display a date using SpreadsheetML, you need to know a bit about SpreadsheetML styles. You will also need to do a fair amount of digging in the specification as well as in existing OOXML-files, since I could not find this information anywhere. Luckily for you, the content of this blog is licensed under Creative Commons attribution license, so feel free to use it however you should wish to do so.

To sum it all up, you will need the following items to display a cell in SpreadsheetML:

1. The cell fragment

  <row r="1">
    <c r="A1" s="0">

Notice that the cell is styled using the attribute "s" with a value of "0".

2. The style part

[code:xml]<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <cellXfs count="1">
    <xf numFmtId="15" (...) />

Notice that index "0" of the <cellXfs>-collection has a numFmt-attribute with the value "15" resulting in displaying the date correctly.

I have created a small test file based on the walk-through above and it is available here: test_dates.xlsx (2.25 kb).

And in other news:

So, you might ask, how is this done using other document formats? Well, it turns out to be drastically less complex.


  <table:table-cell office:value-type="date" office:date-value="2009-12-20">


  <row r="1">
    <c r="C4" t="d">

Both examples above should require no additional formatting.

You might also ask, if this could have been done in any other way in OOXML? Well, as far as I read the specification, there is no way around the style-part-trouble. But you could create your own number formatting if you should wish so. I would actually prefer this angle, since it would be a step away from pre-determined (implied) values in styles and keep the package content self-contained.

You know, this could actually be the basis for a nice new defect report for WG4: "Remove all implied values in the specification and move them to the transitional Part 4".

Is there an end of it?

I know this was quite a lenghty post - but is it of any value at all - and would you like more of these investigative posts in the future?


JTC1/SC34 WG4 appointed Danish expert

On Friday, October 24th the Danish mirror-committee to JTC1/SC34 had its bi-monthly meeting. On the agenda was, amongst other things, assignment of participants to the newly created working groups in JTC1/SC34, WG4 and WG5.

For those of you not familiar with the establishment of these two groups, WG4 will deal with maintenance and development of OOXML. WG5 will work to "Develop principles of, and guidelines for, interoperability among documents represented using heterogeneous ISO/IEC document file formats." So the latter WG is not really about translating between document formats such as ODF and OOXML. No, it is about creating some guidelines that all (future or present) document formats could use as inspiration when designing the formats to be "interoperable".

I think the prospects of this could be really, really good and I hope as many stakeholders as possible chooses to join the work. It would be great to have som kind of guidelines for interoperability comparable to the Accessibility-guidelines from W3C (those that was added to OOXML during the BRM in Geneva).

We did not get any confirmed pledges to participate from the members of the Danish committee, but I was very pleased to hear that both ORACLE Denmark as well as the Technical University of Denmark would investigate if they could join the working group.

More interesting to me was assignment of participants for Working Group 4 to develop and maintain OOXML. Not surprisingly (since most of the participants of the committee are much more "anti-OOXML" than "pro-ODF" this point of the agenda received far less attention. We have in CIBER Denmark discussed for quite some time if we should join the working group, and we have reached the conclusion that we would. We do this of the following reasons:

  1. We believe that we would be able to deliver some technical skills that would be valuable to the work around OOXML
  2. We believe that it is important that development and maintenance of OOXML is not done exclusively by ECMA under the "ISO brand" and
  3. we believe that it is important to create a Danish "foot-print" on the development of the document format
So when the committee was asked if anyone would join, CIBER stepped up to the plate. I am happy to say that both the potential commitment of ORACLE Denmark and Technical University of Denmark and the confirmed commitment from CIBER received unanimous support from the other committee members.

So now what?

well, the first draft of the agenda for the meeting in Okinawa has been posted on the SC34-website. At present the agenda is this:

Draft agenda

  1. Opening - 2009-01-28 10:00
  2. Roll call of Delegates
  3. Adoption of the Agenda
  4. Defect Reports
  5. Any other business
  6. Closing

I think we will also talk about what to actually do in the foreseeable future both with respect to handling of defect reports and future maintenance. One of the things I will not accept (and I hope nor will the other appointed experts) is that the working group will primarily focus our time on defect handling - all while ECMA works on new stuff for OOXML and eventually dumping this on our table. So we will need to establish some sort of agreement around this.

Also we will need to talk about future places to meet. Next meeting will likely be held in Pragh, and I would like to some how make sure that future meetings are held in cities near major airport hubs around the world. It will take me about 24 hours to travel from Copenhagen to Okinawa, and that travel period would be cut in two, if the meeting was held in e.g. Tokyo or Kyoto. This is not a criticisme of the Japaneese decision to have the meeting in Okinawa, but I believe we would indirectly encourage more participation if the required travelling was not so extensive.

Oh ... and did anyone notice that I was only mentioned in the "Small news"-section of Alex Brown's recent post "More Standards news"? This really helps keeping both feet solidly on the ground and not thinking too much of myself.