a 'mooh' point

clearly an IBM drone

Conformance of ODF-documents

Ever since the now infamous article by Alex Brown the blogsphere has been filled with interpretations of the, really not so surprising, results - that the OOXML document with the original ECMA-376 spec does not conform to IS 29500.

The, really not so surprising, conclusions have been "Office 2007 does not even produce valid OOXML" followed closely by statements like "This shows that Microsoft Office 2007 should not be allowed since it does not produce valid OOXML".

Hmmm ... ok.

As some of you might remember, I participated in some lab tests with OOXML/ODF interop in Fall 2007. Basically I sat in a small room with guys from IBM, Microsoft, Novell and some guys from the Danish National IT- and Telecom Agency sifting through documents, converting them and examining the resulting XML generated. The documents we worked on were supplied by different parts of the Danish public sector. They were basically told to use some of their existing documents as basis for the parts of the tests they participated in. So these documents were real-world-documents.

One of the things we tested was to see if the documents were in compliance with their respective specs. The original OOXML-documents we tested were all compliant to the ECMA-376 spec ... but it was a different case with the ODF-documents. So the other day I tried to validate all the sent-in original ODF-documents supplied to us.

The results are illustrated in the table below:

File name

Generator

Konklusion

DFFE_Afgået svar til Jane Doe.odt

OpenOffice.org/2.3

not valid

DFFE_SJ_(1) - 15-06-2007 Foreløbig Høring om forslag.odt

OpenOffice.org/2.0

valid

GRIBSKOV_bek-281(BS).odt

OpenOffice.org/2.0

valid

GRIBSKOV_Standardbrev ifm ITST pilotprojekt.odt

OpenOffice.org/2.2

valid

GRIBSKOV_Udkast til Forslag til Lokalplan.odt

OpenOffice.org/2.1

not valid

ITST standardbrev ODT.odt

OpenOffice.org/2.0

valid

ITST Testdokument ODT.odt

OpenOffice.org/2.2

not valid

RM Kursusmateriale.odt

OpenOffice.org/2.0

not valid

RM Standardbrev 2s.odt

OpenOffice.org/2.3

not valid

The table contains information about the file name of the original document, the application that generated it (from the META-file in the ODF-package) and if the document passed the test.

Overall conclusion of this was:

Application

Creates consistantly valid ODF?

OpenOffice.org/2.0

 

OpenOffice.org/2.1

 

OpenOffice.org/2

OpenOffice.org/2.3

 

So should we demand that OOo not be used at all? Of course not, but we should keep the pressure on the OOo-team to fix their code ... just as we should with Microsoft and Microsoft Office.

Custom XML in ODF (XForms) Part 1

For some time now I have had an urge to see how to do "Custom XML" or "Custom Schemas" in ODF. Saturday evening after the BRM in Geneva I was sitting in the bar at the Kempinsky Hotel at the lake - naturally talking tech-stuff. We talked about the usual ODF/OOXML-stuff and touched upon the subject of Custom XML. I was told that ODF would not have Custom XML capabilities since the ODF TC thought it was good enough to do it with XForms.

Cool, I thought ... I need to test this.

For this first test I have used the UI of OpenOffice.org 2.4 to create an XForm-enabled document with some basic data in it. I will dig deeper into the technicalities later but OOo UI will do for now. I have been searchning high and low for tutorials on XForms and their usage in ODF, and finally I found this article by J. David Eisenberg on xml.org The article is from 2006. I have made a more simple document for this test - avaliable here: xforms.odt (8,98 kb).

I have created a small form to enable the user to type in some basic data, e.g. "name", "phone" and "email".


The idea is to be able to map the typed-in data to a XML-structure. In my case the structure is this:

<xforms:instance id="clubData">
    <club>
        <name />
        <contact>
            <name />
            <phone />
            <email />
            <city />
        </contact>
    </club>
</xforms:instance>

I have set up the document to do more or less what the original article did so let's look at what is really persisted in the ODF-package. An XForm is basically a connection between "input fields" like "text boxes", "radio buttons" and "drop-down menus" and some XML in the document. Look at the content of a part of the content.xml-file below (some details have been removed to enhance readability):




So 1) puts a control (input field) next to the text "Club name" with control-id "control1". This control is further defined in 2) where the XForms "bind"-attribute 3) tells the application to bind the  contents of the control to the XML specified with the XPath expression in 4).

It's really cool and nicely set up.

But what about persistance of the data entered in the form fields? Well, you add a button and attach an action to it. I called my button "Persist". What this action does is defined with the XForms "Submission" element.



In short the above describes that the content of the form fields should be persisted in a file on my local hard drive. Other methods could be to post it to a webserver or URI somewhere. This is very similar to how InfoPath works.

But at this moment I have two outstanding issues - and here I could use the help of you guys:

  1. I would really like to persist the data in the ODF-package - but I cannot get my head around making OOo doing it. Is it at all possible?
  2. When I click the button nothing happens - the data is not saved to disk. What am I missing here?


This is my first attempt to work with XForms in ODF, and to me it really seems kind of nifty.

So the guy tossing down beers at the bar in the hotel was kindda right - it is possible to do some kind of CustomXML-embedding in ODF using XForms. I also think, however, that it doesn't make a whole lotta sense to compare XForms with the CustomXML-implementation in OOXML - especially if it is not possible to use XForms actions to save the data directly in the package. In this case it seems to me that XForms should be compared to InfoPath instead.

 

So guys, what do you think? What are your experiences with XForms in ODF?

Smile

Formulas in ODF-supporting applications

Some time ago I noticed that Fredrik e. Nielsen had posted a link in a Norwegian debate to a website comparing spreadsheet formula interop using ODF. The article is from 2005 comparing formula interop between OOo Calc 1.9.117 and KSpread from KOffice 1.4.1. The article is interesting since it highlights one of the more serious problems with lacking spreadsheet formula definitions in ODF. Some of the pictures in the article are missing and because the article is 2.5 years old, I thought it'd be interesting to take it for a spin again and see what has happened since 2005 in terms of interop between the two major ODF-implementations. I have done exactly the same as in the original article and have additionally added a bit of research to see where the problem really lies.

What did I do?

Well, on my brand new ubuntu 8.0.4 installation I installed KSpread 1.6.3 in addition to OOo 2.4 that came pre-installed with the system.

I created a spreadsheet using OOo 2.4 Calc with the data from the original article (formula OOo.ods (7.58 kb)

 



I then tried to open it using KSpread. This is what it looked like:

 



As in the original article I modified the formula to fit in KSpread and the result was:

The file s available here: formula KSpread.ods (5.44 kb)

When saving this file and opening it in OOo again, this was the result:

So there has actually (pheew) been some improvement in spreadsheet formula interop for applications using ODF Spreadsheets since 2005 ... thank God! At least now OOo is able to show the formula created by KSpread.

To take a more deep look into what was the cause of the problems, I added some information to the original spreadshee. The result is here: Since OOo can read the formulas from KSpread, I have opened the file using KSpread to demonstrate the problem:



The file is available here:(formula OOo exp.ods (9.30 kb)

So what should we conclude from this very basic test? Well, you tell me ... but at least, when someone next time tells you, that lacking formula spec in ODF is not a practical problem but only a theoretical problem ... please tell them that they are wrong.

(D)IS 29500 ISO process F.A.Q.

Due to the still overwhelming interest of the now done ISO DIS 29500 process, ISO has created a small F.A.Q. to answer some of the more frequently asked questions.

My excerpts from the F.A.Q. are listed here:

Q: How could a 6.000-page document be fast-tracked?

Because the information technology (IT) sector is fast-moving, the joint technical committee ISO/IEC JTC 1, Information technology, introduced the "fast track" process for the adoption as ISO/IEC standards of documents originating from the IT sector on which substantial development has already taken place.

(...)

The number of pages of a document is not a criterion cited in the JTC 1 Directives for refusal. It should be noted that it is not unusual for IT standards to run to several hundred, or even several thousand pages.

ISO/IEC 29500 has spent a total of 15 months being processed within the ISO/IEC system, from its submission in December 2006 to the deadline of 29 March 2008 approving it.

Q:  Why would ISO and IEC allow two standards for the same subject?

(...)

In this particular case, some claim that the Open Document Format (ODF), which is also an ISO/IEC standard (ISO/IEC 26300) and ISO/IEC 29500 are competing solutions to the same problem, while others claim that ISO/IEC 29500 provides additional functionalities, particularly with regard to legacy documents.

The ability to have both as International Standards was something that needed to be decided by the market place. ISO and IEC and their national members provided the JTC 1 infrastructure that facilitated such a decision by the market players.

Q: What about hidden patent issues?

(...)

Microsoft, the holder of patents involved in the implementation of ISO/IEC 29500, has made such a declaration to ISO and IEC. If, after publication of the standard, it is determined that licenses to all required patents are not so available, one option would be to withdraw the International Standard.

Q: What about contradictions with other ISO and IEC Standards?

(...)

A number of such claimed contradictions were identified during the one-month JTC 1 fast-track review period, prior to its release for voting and comment. The submitter, Ecma International, responded to these comments at the end of the review period.

Some of these comments were reflected in national body comments on the fast-track Draft International Standard (DIS). These comments, e.g. the non-alignment with ISO 8601, Data elements and interchange formats – Information interchange – Representation of dates and times, were dealt with in the ballot resolution meeting (BRM).

It is possible that others may still remain, but these can be taken care of during the maintenance of the standard.  In all cases, the final decision on whether there are contradictions and how to resolve them rests with the national members of ISO and IEC.

Q: Will ISO and IEC review how ISO/IEC 29500 was adopted?

We reviewed the process before it started, all the while during its course and afterwards as well. While the voting on ISO/IEC 29500 has attracted exceptional publicity, it needs to be put in context. ISO and IEC have collections of more than 17 000 and 7 000 successful standards respectively, these being revised and added to every month. This suggests that the standards development process is credible, works well and is delivering the standards needed, and widely implemented, by the market. (...)

Object-embedding in OOXML with Microsoft Office 2007

(updated 2008-04-14, added links to external resources) 

Now that the ISO-vote and approval of OOXML is done with, it is time to continue the coverage of implementing OOXML as well as ODF – this time about OOXML, Microsoft Office 2007 and embedded objects.

As I have previously said, there are always quirks when it comes to implementations of any standard in large applications. I have covered a few of these already regarding mathematical content [0], [1] and it is no different with regards to object embedding. I should say that a source of inspiration to this article was Stepháne Rodrigues’ article about binary Parts of an OOXML-file (OPC-package).

Now, embedding objects in an OOXML-file is pretty straight-forward: Simply add the object somewhere in the package and make a reference to the location and specify what kind of file you are embedding. This is very similar to how it is done in ODF.

(note: the specific schema-fragments defining how to do this were dealt with and changed at the BRM, so I will not include these until the final version of IS 29500 is released. I will update this article according to the revised spec).

As I have noted earlier, interoperability happens at application-level, so it is worth pondering a bit on how the specification is implemented in the major implementations of it. So let’s see how Microsoft Office acts when embedding objects.

What I did was this: 

I used Microsoft Office 2007, created a text-document and I embedded an object in it – in this case an OpenOffice.org Calc Spreadsheet. The spreadsheet is also inspired by one of Stepháne Rodrigues’ articles, the infamous “OOXML is defective by design”.

 

The object is inserted and displayed in the document. When activating the object, I can edit it as if it was in OOo Calc itself. Actually it is OOo Calc itself. It is invoked using OLE and as a side-note it shows a cool thing about OLE – or similar other object linking techniques. Microsoft Office 2007 does not know anything about OpenOffice.org, yet it is still able to invoke the application and edit the embedded object.

 

Ok – now let’s look at the OOXML-file created. In the file document.xml the following fragment is located:


The <v:shape>-element is part of the nasty VML-dependency that luckily was dealt with at the BRM. This will be replaced by DrawingML in the final IS 29500. The <o:OLEObject>-element specifies the type of the embedded object (“opendocument.CalcDocument.1”) and the location of it (“rId5”). There is really nothing platform dependent here in the OOXML-markup.What is more interesting, though, is looking at the Calc-object after it is embedded. By navigating through the relationship-model of the OPC-package, the embedded object is located.

 

One might think that this file was simply the Calc-file renamed, but sadly this is not so. This file is actually the Calc-file wrapped in an OLE2 Compound file (“CF”). The CF-file is basically a stream wrapper which allows a number of streams to be persisted in a file as well as information about these streams. Using one of the many CF-viewers you can get the data of the wrapped file itself as well as the persisted information of it, here “com.sun.star.comp.Calc.SpreadsheetDocument _   Embedded Object _   opendocument.CalcDocument.1”.

 

 

Technically this is really not a big deal – there are well-known ways to manipulate these files on all platforms and most programming languages and extracting the required data should really be a no-brainer. OpenOffice.org is licensed under LGPL, so you can use the source-code from this to figure out how to do it on the platforms supported by OpenOffice.org. It is also pretty evident why Microsoft Office 2007 works this way. Microsoft Office 2007 is the latest incarnation of the Microsoft Office Suite – a suite that has depended on this file format since at least 1999 … and of course on OLE itself as well. So if you want to implement a document consumer, this is simply something to be aware of when consuming OOXML-files.

From the perspective of a developer, however, this is really annoying. I would definitely opt for Microsoft Office 2007 embedding the objects simply as the objects they are – and not wrapping them in a CF-wrapper. This is how it is done in OpenOffice.org. Granted, this suite does other weir(d) things like renaming the files and not being entirely clear how to embed all object types, but the objects are embedded as they are (unless they are OpenDocument objects). This is a benefit to me as a developer when examining OOXML-files, because I can simply extract the object in question from the document package and verify the file.

So this might be the first new post-vote change-modification to IS 29500:

 

When embedding objects an application shall not modify or wrap the embedded object in any way before embedding it in the package. When a document consumer encounters an embedded object, this shall not be converted to another object type without knowledge-based confirmation by the user.

 

This (or similar woring in standard-lingo) would prevent Microsoft Office in wrapping objects on CF-wrappers, but it would also prevent applications like OpenOffice.org on SUSE to convert embedded Excel-objects to Calc-spreadsheets. FYI, this kills interop too.

A final request: Microsoft, please, as you must already be implementing the changes from the BRM for Office 2007, would you be so kind to make this change to the application as well? It should really be a no-brainer, and if there should be any requirements in your code for the CF-files, feel free to load the objects, wrap them in an in-memory CF-file and take it from there.

Smile

Three monkeys - one was Håkon Lie

(corrected quote of Håkon Lie) 

After the demonstration in Oslo yesterday (damn I wish I had been there) the CTO of Opera Software, Håkon Wium Lie was interviewed by Norwegian newspaper VG. The interview is in Norwegian but let me translate a bit for you:

Håkon Lie: What might happen if Microsoft gets this [OOXML ISO-approval] [OOXML added to the list of approved mandatory document formats in Norway, JLS addition ] through is that Norwegian authorities may be forced to use it, and this means that if you receive an email with an attachment and you don't have a program to read this attachment - it could be a message from a teacher of your child that attends a Norwegian school - when you cannot open this attachment, you will have to buy software from Microsoft. So this is really a "Microsoft-tax" that may be the consequence if Microsoft wins here. We are against this.

Dear Håkon, I love the software you guys make - I use it every day on my cell-phone ... but are you out of your mind? I would expect those kinds of arguments from the typical Tux-f**kers (or in reverse, from the usual Microsoft fan-boys whose coding-skills evolve around point-and-click in Visual Studio Web Developer). I would not expect this from the CTI of the third-largest browser-producer in the world and your argument here makes it all so much clearer for me why Standard Norge discarded your arguments.

I am sure Gene Amdahl would be proud of you.

Smile

 

We shall gather at the riiiiiiiveeeer!

Today it happened ... the world lost its perspective.

 





Have You Read at Least 1 Page?

 

(I suppose this is as good a time as any to post this)

Smile

OOXML is now IS 29500

Long awaited, the votes on the ISO/IEC DIS 29500 have been counted and verified. The unofficial result began circulating between the national bodies yesterday but the result was not made public until today, Wednesday April 2nd 2008.

The results are pretty clear: OOXML has now been approved as an ISO/IEC international standard.

Result of voting

P-Members voting: 24 in favour out of 32 = 75 % (requirement >= 66.66%)


(P-Members having abstained are not counted in this vote.)


Member bodies voting: 10 negative votes out of 71 = 14 % (requirement <= 25%)


Approved

I think this pretty much summarizes it.

It's been a good one - thanks to all I have worked with throughout this process the last year or so - it's been great getting to know you. Also thanks to everyone contributing their valuable input to this process.

Crucial days in Denmark - behind the curtains

Wow - this week has been truly 1800-UNBELIEVABLE (to use the phrasing of Andrew Dice Clay). Almost a week ago we sat down in the Danish National Comittee to try to reach consensus about a guidance to Dansk Standard to help them decide the Danish vote of DIS 29500. As reported by Dansk Standard in their press-release, we failed to do so.

Dansk Standard: After the Ballot Resolution Group meeting the committee was unable to reach consensus as to whether it was decided to incorporate all Danish comments into the final standard. Another point of disagreement was the state of maturity of ISO/IEC DIS 29500 OOXML as an ISO/IEC standard.

The meeting took the better part of 8 hours and was, at least to me, extremely tough and exhausting. I am sure we all know the feeling and energy-level after pulling an all-nighter at work, and as the sun rises in the morning, the team decides to go home, catch a few hours of sleep and meet again for lunch. The feeling you have when you step out of the building in the cold morning air - this was exactly how I felt when the meeting was done. Add to this the sensation that "I'm not sure we're gonna make it after all". It was not good. This was Wednesday evening. On Thursday evening the mood was remarkably better since during the day, we had come to the conclusion that is was not that bad after all and that we had done everything we could - given the circumstances at the meeting.

The only thing regarding consensus we could agree on was this (my translation):

The committee requests that Dansk Standard, as best as they possibly can, honors the technical work that the committee has done. The committee asks that Dansk Standard takes note of the fact that the committee did not reach a consensus regarding if Denmark should change its vote on March 29th 2008.

Friday - damn! I had actually expected the anti-OOXML-mob to perform a DOS-attack on the process during the last 14 days before the ISO/IEC deadline. Surprisingly, this did not happen. Well, in Denmark it happened on Friday morning. Dansk Standard had promised to notify the committee by email before making their decision public - but they had said nothing about when. So Friday was quite an anxitious day. It began with an interview with Morten Messerschmidt (Dansk Folkeparti), where he basically told Dansk Standard to maintain the original "Dissaprove"-vote if no consensus could be reached. The debate on the two primary IT-websites in Denmark increased during the morning hours and information from the meeting began to leak to the media. Countless emails were exchanged between delegates from the BRM to figure out what was happening on a global scale. At 12.10 the email arrived in my mailbox.

The rest, you could say, is history.

Almost immediately the conspiracy-theories started to flow and the influx of leaked (and sometimes false information) information increased. Even before the announcement it was obvious to me, that the anti-OOXML-lobby, in case they lost, would attack the process and I was sadly correct. Within minutes after the announcement, they started attacking Dansk Standard. Friday afternoon the vice director of Dansk Standard, Jesper Jerlang, was interviewed and he denied any allegations that the process was not carried out in a proper manner. He commented on a couple of things in the interview. First he commented on the basis of changing the Danish vote:

Background-inf: Denmark had 168 comments that we went to Geneva to fight for. All these comments were approved (with one, small, outstanding issue) and will become part of the IS 29500, if it is accepted.

Jesper Jerlang: So even though there is no consensus as to whether the 168 suggestions have been fully implemented, we believe that we are so well on the way that the demands for an approval have been met

He also commented on the process, saying that (my translation):

Jesper Jerlang: So even though there is no consensus that all 168 comments have been fully implemented, we believe that we have come so far, that the conditions for an approval has been met

One of the flanks of criticisme has been whether undue influence by major companies had taken place, and he had the following to say about this (my translation)

Jesper Jerlang: There has been a lot of political focus on the process, but the process was carried out completely by the rules, so there has been no deviations. Vi have naturally taken care of that the discussion was focused on the content and not the process, but it is clear, that there is commercial politics in this matter, and it is also the reason that the committee, at the final hour, does not reach consensus - but we knew this from the beginning: That at this point, there would be to sides that would each one fight for their views. This is why we have made a great effort to manage the process by the rules, so that we have been complety comfortable saying, that on the basis of the process we have been through, we have been able to decide how we best take care of the Danish interests, as they are written in the list of comments from the committee

So what do we do now? Well, first we await the final tally from ISO/IEC and then, regardless of the out-come, we all get back to work. In Dansk Standard we continue with the next subjects at hand, most prominently ODF v1.2, should OASIS decide to do this in ISO.

Oooh - and the blame-game will likely continue for a couple of weeks.

Smile