a 'mooh' point

clearly an IBM drone

Correct according to spec or implementation?

In the recent SC34 WG4-meeting in Stockholm, validators quickly became the talk of the town - so to speak. As I am sure you all know, Alex Brown made the office-o-tron some time ago - a validator targeting both ODF and OOXML in their ISO-editions. A few weeks ago I myself made a validator - but mine only targets OOXML in its "latest-improved-and-approved-transitional-version". Alex Brown's is written in Java and mine is written in C# .

Anyways - both Alex and I had some lengthy discussions with Microsoft about our validators and the errors they report. The thing is - there is a bug in the OOXML-specification dealing with how to specify relationship type for the part containing document properties like "author", "created on", "last modified on" etc. This part is a "central" part in OOXML, and to the best of my knowledge, there is not a single implementation out there that doesn't use this part for storing these so-called "core properties".

If you have tried to validate an OOXML-file in my validator, you'd probably have encountered this error:

Checking relationshiptype http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties ...

RelationshipType http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties is not valid. It should have been http://schemas.openxmlformats.org/officedocument/2006/relationships/metadata/core-properties.

In OOXML the "glue" tying the document and its various parts together is "relationship types". So for a given media-type (content type), a relationship type has to be used to properly register it in the package. A few relationship types are defined for the common parts of OOXML documents, i.e. for wordprocessing files, for spreadsheets, for presentations, for headers, footers etc. Some of these are defined in Part 1, section 15 and this is where the bug is. It is obviously a typo, and it has already been included in our list of fixes for the next batch.

The trick is - this has rather drastic consequences - at least from a validation point of view. Because a typo in this area will affect almost every implementation of OOXML that persists these basic data-chunks.

The thing is ... each and every document created by Microsoft Office will likely fail due to this bug in the specification.

So what are you gonna do?

Well, we discussed several different approaches.

One was simply to correct my validator to not report this error. I don't really like this idea, since it opens a flood gate of other scenarios where small corrections should take place Also, if I did want to go down that road, it should require a strategy for handling these things since I wouldn't want to correct any one error based on what Microsoft Office does - being an IBM drone and all. As of yet, I haven't been able to come up with such a strategy.

A second was to report warnings instead of errors in areas where "known bugs" were already in our to-do list of future corrections. I am not sure I like this either since it makes the validator almost impossible to maintain and it muddens the results since no-one will be able to figure out if a warning was simply a warning or a "down-graded error".

A third option is to do nothing.

I like that.

If you have tried to validate the same document using my validator and Alex's you'd probably have noticed that Alex's validator emits many more errors than mine. This is due to the fact that I use the schemas with the first batch of corrections (the so-called COR1-set). I'll update the schemas whenever the next batch of corrections are approved by either SC34 or JTC1. Alex's validator uses the schemas that was in the originally approved version of ISO/IEC 29500:2008. So my validator is already pretty "graceful" as it is.

Aonther reason that I like the idea of "doing nothing" is that it emphasizes a crucial point: A document should be valid according to the spec and not according to whatever implementation one considers "reference". There are other standards out there where we have a strange mixture of behaviour defined in the specification and behaviour buried in a "reference implementation". I don't know about you - but I'd rather have the spec be "the truth" than several gigs of source-code from whatever implementation is the pet app-de-jour at the moment.

Additionally, this shows us that all the implementations that handle this have failed in terms of feeding their experiences back to the standardisation organisation maintaining the specification. They will all have encountered this issue - but failed to report it ... unless, of course

  • they haven't looked in the spec at all [0]
  • they haven't bothered to validate their documents

The puzzling thing is - Alex and Gareth discovered this bug in January 2010 and his validator has been reporting this error for months now. I guess the answer to why neither of the implementers of OOXML has reported this bug is ... blowing in the wind.

So what I am trying to say is this: My validator stays the way it is - validating documents according to the spec. If any vendor discover a problem that is clearly an error in the spec, they should prioritize notifying us about it so we can correct it (which we will).

 

 

[0] Truth be told, prioritizing "make it work with most important implementation" is not the un-heard of. I myself, when I created my first ODF-files, didn't look in the ODF-spec. I reverse-engineered ODF-documents created by OOo since I only cared about whether OOo would eat it or not. Other implementations insist on not "supporting OOXML" but "supporting Microsoft Office output".