a 'mooh' point

clearly an IBM drone

Standards and dining

I just wanted to share with you a sign we noticed at the bistro we dined at yesterday.

 



Se my Google map.

Smile

And we have lift-off (white noise)

It is only a matter of minutes until the BRM starts here in Geneva. It will be a tough week here with long meetings during the day and preparation in the evening (good bye Red Light District!). So to be able to concentrate fully on the task at hand, I will shut down this blog for the  week ... but don't worry - I'll be back soon.

Smile

Santa Claus is coming to town

Now there are only a few days until I jump on a plane and head South to Switzerland, Geneva for the ISO/IEC SC34 Ballot Resolution Group Meeting, amongst laymen known primarily as "The BRM meeting". I cannot get my head around if I am exited or worried about the outcome of the meeting ... thinking primarily about the enormous workload expecting us down there. We will have to work through about 1000 unique disposition of comments from ISO/IEC editor Rex - scattered over about 3500 comments in total. It's a daunting task indeed - not least for BRM convenor Alex Brown from BSI UK. Adding to this workload is the small addition, that we will be 120 delegates dealing with it. It truly is breath-taking and I cannot help but feel like a mountain-climber standing at the foot of Mount Everest waiting to start the journey upwards. I expect the days to be work in the BRM meeting during normal work hours and work in the evening at the hotel sifting through the results of the day preparing for the next.

I am also thinking quite a bit on what will actually take place in Geneva at the meetings. As I understand the ISO rules (and please note, I have been wrong before), after the BRM is done, the standard to approve is the original submission with the changes made in Geneva. In other words - if not a single disposition can be agreed upon, the standard stands as it did when it was submitted in Spring 2007. I really hope that the delegates opposing OOXML do not try to paralyze the BRM with a massive DOS-attack on the process. As Alex Brown points out, it is the responsibility of the Head of Delegations (HoD) that this does not happen, and if I look at what we have been informed by the Danish HoD, it is clear to me, that they actually have a lot of future credibility in standards work vested in this. If they are not able to perform in an ordily manner at the BRM, their influence in all the other work they are doing will be diminished. I hope this will keep the lid on most of the fanatic out-bursts.

I am also looking forward to meeting some of the people I met in Kyoto in December 2007. Of course it is always nice to talk to people you agree with, but I sometimes get a bit bored with the "echo-chamber"-feeling of spending too much time with people of your own opinion. So I am even more looking forward to conversations with the delegates (and, yes, even the people of Open Forum Europe, who I have been told will be cheering us along in the corridors of the meeting) who are a bit more on the negative side of DIS 29500. It will be interesting to see what they think.

OOh ... and on Saturday I will go see Dinosaurs

Wanna join? 

 

OOXML is defective #2 (depends on proprietary technologies)

A standard is not "free enough" if implementation of it depends on existance of a proprietary technology on the specific platform. Ideally it should be possible simply to buy the specification and implement it without any other financial requirements.

This is where OOXML fails.

OOXML heavily depends on "Object Linking and Embedding Technology" also known as "OLE-technology". Section 9.3.3 of the specification deals with how objects are embedded in the file format. The section is divided in two where the first section specifies how to embed documents otherwise defined in this standard. These documents are defined as

  • Formulas
  • Charts
  • Spreadsheets
  • Text documents
  • Drawings
  • Presentations

This is one of the clear cases where it is obvious that Microsoft continiously tries to preserve their main cash-cow: The Microsoft Office eco system! OOXML not only depends on Microsoft's proprietary technology OLE, the specification itself also makes it more easy to embed it's own "cousins" than any other file format. Talk about "first class citizens" of OOXML!

The section goes on telling us about binary objects:

Objects that do not have an XML representation. These objects only have a binary representation [...] (see [OLE]).

WTF? Once again a reference and requirement to use proprietary technologies like OLE! What if I want to embed my own JLSObjectType? What if I want to embed some object from the Linux-world like Bonobo-elements or KParts? The schema-elements only emphasizes my point:

<draw:object/> and <draw:object-ole/>

Are you also puzzled by this? Well, I don't blame you. To wrap up - we can embed "our own documents" and we can embed everything else. There are even two seperate elements from the draw-namespace that specifies this for us: <draw:object/> and <draw:object-ole/>. The entire schema-fragment is included here for your pleasure:

<define name="draw-object">
    <element name="draw:object">
        <ref name="draw-object-attlist"/>
        <choice>
            <ref name="common-draw-data-attlist"/>
            <ref name="office-document"/>
            <ref name="math-math"/>
        </choice>
    </element>
</define>

<define name="draw-object-ole">
    <element name="draw:object-ole">
        <ref name="draw-object-ole-attlist"/>
        <choice>
            <ref name="common-draw-data-attlist"/>
            <ref name="office-binary-data"/>
        </choice>
    </element>
</define>

This is yet another example of Microsoft on one hand claiming "openness" and with the other hand forcing everyone to use their own proprietary, undocumented technology.

But we're not done:

The embedded object is referenced through an XLink attribute in the enclosing frame-element. The behaviour is described as (bold typeface is my addition, /JLS):

The xlink:href attribute links to the object representation, as follows:

  • For objects that have an [OO]XML representation, the link references the sub package of the object. The object is contained within this sub page exactly as it would as it is a document of its own.
  • For objects that do not have an XML representation, the link references a sub stream of the package that contains the binary representation of the object.

Wow - wait a minute: Is this it? Don't you think a bit of clarification would be in order?

The fileformat for the physical file is a Zip-archive with a number of files and folders in it. But this archive also contains a "TOC"-list of the files and the mime type of the entire package. The latter is not an XML-file - where do I put this? Where do I put the TOC-file? What if my spreadsheet contains an image? Since the image is not in XML-format (it's binary) ... would my entire spreadsheet qualify as having "an XML representation"? And did you notice the part "the link references a sub stream of the package that contains the binary representation of the object."? A stream? Binary representation? Again totally unspecified behaviour and noone will ever be able to implement this apart from Microsoft and Microsoft Office 2007.

Microsoft had a good chance to specify this properly in the beginning. They could have made an open format to enable competition or a format that would stiffle competetion. So what does Microsoft do? Yup, the anti-competitive choice. Anyone surprised?

Interoperability - between what?

What is interoperability, really?

Well, when it comes to document formats, some people seems to think that interoperability is the ability to transform one format to another. That high-fidelity interoperability can only be achieved when it is possible to perform a complete translation/conversion of format X to format Y.

The basic problem for this premis is that if you were able to do this conversion, it would be the same as being able to make a 1-1 mapping between the functionality and features of format X and format Y (and vice versa). However - this effectively means that format X is actually just a permutation of format Y ... making format X and format Y the same format (pick up your favorite book on mathematical topology to see the details).

When it comes to ODF and OOXML, the case is pretty clear - the two formats are not the same. Sure - they can both define bold text,  but there are quite a few differences between the formats. A list of some of them can be found at the ODF-Converter website. I think that the list is the best argument for not being able to do a complete conversion of ODF to OOXML (and back). This was also one of the conclusions of the Frauenhofer/DIN-work in Germany, where they concluded that a full 1-1 mapping between the two formats could not be done.

The key question here is: Is interoperability diminshed by this fact?

If you ask Rob's posse, they will almost certainly say "Yes". They will say something like "Microsoft chose not to make OOXML interoperable with the existing ISO-standard ODF and therefore OOXML is a blow to interoperability".

If you ask me, I will say "No". I will say no because the term "interoperability" has been hijacked by the anti-OOXML-lobby in much the same way the SVG-namespace was hijacked by ODF TC. I will say "No" because interoperability means something radically different. The meaning is not rocket sciency, really ... and usually most people agree with the basis definition of interoperability. A few of those are:

Computer Dictionaly online: 

http://www.computer-dictionary-online.org/interoperability.htm?q=interoperability

The ability of software and hardware on multiple machines from multiple vendors to communicate.

IEEE: 

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?tp=&isnumber=4683&arnumber=182763&punumber=2267

the ability of two or more systems or components to exchange information and to use the information that has been exchanged

US e-Government Act of 2002:

http://frwebgate.access.gpo.gov/cgi-bin/getdoc.cgi?dbname=107_cong_public_laws&docid=f:publ347.107.pdf

ability of different operating and software systems, applications, and services to communicate and exchange data in an accurate, effective, and consistent manner.

If you also look at the enormous list from Google you will see, that none of the definitions talk about the ability to convert formats. Instead they talk about communication between machines, platforms and networks. This is very close to my definition of interoperability when it comes to document formats.

The interoperability gained by using a specific document format is based on the possibility of implementing the format on any kind of platform, in any kind of software using any kind of operatingsystem. It is based on how well and consice and clear the language of the specification of the format is and it depends of howwell thought out the specification is.

It has nothing, nothing, nothing to do with the possibility of converting the format to any other format. 

A cry for help

Working almost everyday with implementing solutions that support ODF and OOXML I am naturally tasked (or more appropriately: challenged) with ambiguities in the forementioned specifications. At first glance ODF has an appealing simpleness and form, and reading the specification is almost like reading a book in natural "prose". However - the easiness to read sadly comes at the expense of clear language. So - as always when implementing any specifications, you need to have somewhere to go to ask your technical questions regarding how to implement the damn thing or questions about how to read the devil.

And therein lies my problem:

Where do I go to get answers to get these questions about ODF? Where is the website for ODF-development?

I have tried the forums at opendocument.xml.org -but the groups there are almost dead.

I have tried the maillist for the OpenDocument TC, but it is also almost dead.

So please help me - where do I go?

Update: I almost forgot - I have also prowled the Danish blogsphere where the ongoing battle between OpenXml and ODF usually takes place, but noone has been able to give me any pointers to where they usually get their information about implementing ODF.

(or have I been so heavily stigmatised by being pro-choice that noone wants to help me?)

Smile

OOXML is defective #1 (Pasword hashing)

OOXML has been accused of being rushed through not even the writing itself but also certification in both ECMA and ISO. It's a quick accusation to make but sometimes it can be really tricky to figure out if a statement is true or false. But you know, sometimes you stumple over something that really shows you that the specification was rushed through not only preliminary editing but also certification in ISO.

The one thing I noticed in was password hashing. As with other document formats, document protection can be defined in multiple ways. There is of course protection of the document itself but most document formats also allow protection of specific parts of the document or even read-only protection of the document. The way it's usually done is to ask the user for a password, hash it and store it in the document. When the document is opened the next time, the user is prompted for a password, and if it matches the stored value - the protection of the document (or parts of it) is released.

Now, this is defined, amongst other places, in section 4.4.1 (Section attributes) where it deals with protection of sections. The text says:

A section can be protected, which means that a user can not edit the section. The text:protected attribute indicates whether or not a section is protected. The user interface must enforce the protection attribute if it is enabled.

This is more or less what I wrote above. It also says:

A user can use the user interface to reset the protection flag, unless the section is further protected by a password. In this case, the user must know the password in order to reset the protection flag. The text:protection-key attribute specifies the password that protects the section. To avoid saving the password directly into the XML file, only a hash value of the password is stored.

And that's it.

WTF? Nothing more? Nothing about how to specify the hashing algorithm? Nothing about how to specify initialization vectors, prepending of zeroes ... nothing?

But wait - what if we look in the schema itself - maybe it's just the descriptive text that is a bit ... ahem ... limited. Ok - the schema says:

<define name="sectionAttr" combine="interleave">
 <optional>
  <attribute name="text:protection-key">
   <ref name="string"/>
  </attribute>
 </optional>
</define>

Dammit - nothing here either. Notice also that it is not possible to store the way the hash-value is persisted. Is it a bit-sequence? A Hex'ed bit-sequence? A Base64-sequence? Nothing!

But wait again - let's look into the file of an actual document with read-only protection. Let's see what is stored in the document. Well, the XML-fragment lists as:

<table:table
 table:name="Ark1"
 table:style-name="ta1"
 table:protected="true"
 table:protection-key="PnKGfjzdfrt6XxQxdTcQVqbmA/7Ro="
 table:print="false"
>

Any clever suggestions for me as an ocument consumer to what to do with this value? This is truly amazing. One one hand the authors talk about their document format being able to provide true and pure interoperability ... but they haven''t specified something as common as document protection. I wonder how they can claim this with a straight face. Interoperability is certainly not enabled by limiting the details of the specification to as little as this ... but maybe they just hope noone will use this feature and thereby have "interoperability by rejection".

I cannot help to wonder: who in their right mind would put up a suggestion for standardisation of a document format that was unspecified in such a central feature as "document protection". This must be one of those places where

Ratification trumphs perfection 

Yeah, well ...

Word of recognition from an unexpected side

Today - or was it yesterday? - Patrick Durusau issued an open letter regarding the standardization of OOXML. It is an interesting read - especially for those of us that have worked endless hours in NSBs with processing the dispositions of comments from IEC/ISO editor Rex Jaeschke. I will not dig too much into the details of the statement, since I am sure others will do so, just quietly note that is it nice once in a while to be appreciated and not only picked at because of our "lack of qualifications" and accusations of being angle-grapping, bribed, paid for puppets only acting by the will of Microsoft.

Thank you, Patrick!

Smile

I will only quote this:

The OpenXML project has made a large amount of progress in terms of the openness of its evelopment. Objections that do not recognize that are focusing on what they want to see and not what is actually happening with OpenXML

Ooh - and one prediction: I think the anti-OOXML-lobby will try to drop this like a hot potato. The Pro-choice side will naturally salute this - and the Pro-ODF side will quietly wait out the storm quietly mumbling "Nothing to see here, please pass along".

Yes, some of them might even use some of the skills they learned in the third part of the course they took, Hypocricy 101.

"Talk is silver, but silence is gold"