a 'mooh' point

clearly an IBM drone

What's up with OLE?

A few weeks back I made an article about how Microsoft Office 2007 dealt with password-protection of an OPC-package, since this feature is not a part of the OOXML-specification. The answer I found was that Microsoft Office 2007 persists the password-protected file as a OLE2 Compound File ... more commonly known as a "OLE-file". I also concluded that using OLE2 Compound Files is not a problem - and certainy not an issue regarding OOXML.

Now - the whole topic around OLE has been at the front row of the worldwide debates regarding OOXML. My personal opinion is that the people jumping up and down screaming about problems with OLE ... really haven't understood what OLE is.

So let me start by making a small recap' of what it is really all about.

... there is OLE and then there is OLE 

First of all:

there is "OLE" and then there is ... "OLE"

... or put in another way:

there is the "OLE-technology" and then there is the "OLE-file"

or in a third, more correct, way:

there is the "OLE application technology"  and then there are "Compound Files".

The foremore mentioned is the technology that - on the Windows platform - enables a program to use the UI of another program ... without launching the entire application itself. I mostly use this when editing MS Visio-documents in Word but other usages of this is using an Excel spreadsheet in an MS Word application. The OLE-technology itself is a tool on the Windows-platform that all applications can - and do - use to enable "utilizing other applications in their own applications". It is here important to understand, that there is (today) nothing really revolutionary about OLE. Another similar technology on the Windows-platform is DDE and on the Linux-platform it could be KParts and Bonobo. These technologies simply enable one program to communicate with another (simply put).

But what about these OLE-files?

Well, Compound Files are actually not dependant of OLE-technology. Or put in another way: you don't need OLE-technology to read and use the contents of a Compound File. Compound Files are just files. A Compound File is a collection of persisted streams - actually much like a ZIP-archive. Most commonly it is used because it brings the ability to "utilize a file system within a file". Of course you will need to know how to use the contents of the file, be it created by OpenOffice, Corel Draw, Adobe Acrobat or any application that might store its files using Compound Files. But this is seperate from being able to read and write to the contents of a Compound File.

Ok - I will not bother you any more with this. You should check out the original article about OLE and also look into the specification of the binary formats for Microsoft Office95 - Office2007, avilable from Microsoft. It is actually quite interesting. Just remember that OLE-technology and Compound Files are not the same thing.

And now for something completely different (kindof)

In the lab-tests I have been part of for the Danish Government (National IT and Telecom Agency) we have tested OLE-interoperability. It is important since it is quite normal to embed e.g. a spreadsheet file in a Text-processing file. So it is important that the contents of the file is actually usable when receiving it and opening using another application or on another platform.In this setup we only tested Compound File interop and not interop between OOXML and ODF.

What we did was this:

We created a ODF-file using OpenOffice where we embedded a Excel-spreadsheet (binary .DOC-file) (on the Windows-platform)

We sent this file to a number of different platforms and applications

  • Windows XP using OpenOffice.org 2.3 DA
  • Windows XP using OpenOffice Novell Edition
  • Linux using OpenOffice Novell Edition
  • Linux (SLED) using IBM Lotus Notes 8

We tried to open the file and documented what happened.

#
Setup  What happened? 
1
Windows XP using OpenOffice.org 2.3 DA OpenOffice.org opened the document and correctly displayed the contents of the spreadsheet. It was possible to edit the spreadsheet and save it back into the ODF-container
2
Windows XP using OpenOffice Novell Edition OpenOffice Novell Edition opened the document and correctly displayed the contents of the spreadsheet. It was possible to activate the spreadsheet but only in "read-only"-mode
3 Linux using OpenOffice Novell Edition OpenOffice Novell Edition opened the document and correctly displayed the contents of the spreadsheet. It was possible to activate the spreadsheet but only in "read-only"-mode
4
Linux (SLED) using IBM Lotus Notes 8 Lotus Notes 8 opened the document and correctly displayed the contents of the spreadsheet. When activating the spreadsheet the user was prompted to convert the spreadsheet. When accepting this it became editable and when saving it back into the ODF-container, the spreadsheet was persisted as an Open Document Spreadsheet.


So what we saw was basically 3 different approaches to handling the embedded object. In general the Excel-object (Compound file) itself was not a problem - regardless of application and platform. All combinations had no problems with opening the file and displaying the contents - even on platforms without OLE-technology present. The difference was in the applications and their handling of the object. OpenOffice.org presented the approach that most people would expect: it allowed editing the embedded object and saving it back into the container. OpenOffice Novell Edition allowed activating the embedded object but not saving it back into the container and Lotus 8 took the approach of converting the Excel-object to an Open Document Spreadsheet.

A conclusion?

Well, we took great care not to conclude much - that was not for us to do, we merely provided the technical background for post-lab conclusions. However - the pattern emerging from the description above was similar to a pattern we saw a lot. The problems were not in incompatibility between the formats but instead in how the applications and converters dealt with the formats. We also saw no indications that any of the formats were tied to a specific platform. There were no problems with roundtripping - or to put more clearly: the problem we saw when round-tripping documents were not caused by incompatibilities between the platforms (e.g. Linux and Windows) but between different behaviour in the applications implemented on either platform.

So is this good or bad news? Well, as always, truth lies in the eyes of the beholder ... but I think it is good news. 

Where did my line go?

When we started doing our tests in the lab and started thinking about what we thought we would be seeing, we had a very clear understanding that it would not all be blue-sky conversions and that we would identify problems - some more severe than others. We were also pretty aware, that there would be areas, where conversion was just not possible.

But - I am pretty sure I speak for the rest of the group - we were quite surprised to see which areas this concerned.

On area where absolutely nothing could be converted was ... lines. Not only line art, not only complex line drawings ... but simply - lines.

Lines are done in OOXML as either VML or DrawingML and in ODF it is done using a SVG-derivative. The puzzling thing is, that this area is apparently simply left out in either of the converters. We made some simple documents (line.docx 10,47 kb) and (line.odt 6,60 kb)  [I have re-made these for this article]. When converting these files using CleverAge 1.0 on Microsoft Office 2003 and 2007, Novell OOXML Translator (on Windows and SLED) or IBM Lotus Notes 8 (on SLED), the lines are simply removed. They are not altered, they are not just hidden, they are not moved to a different location in the document ... they are just removed.

This is another example of the overall observation from our tests ... the quality of the converters are simply not good enough today. If you look at the XML in either of the files above, you will see, that even though they look different, they basically specify the same thing (start and end-point for the line drawn), so technically it should pose no problem to be able to do a better conversion.

It is often said, that the main problem with converting from ODF to OOXML (and vice versa) is incompatibilities between the formats. This example is by first glance suporting this argument, but if you dig a bit deeper into the technicality of it, is simply boils down to a problem with bad converters.

Conclusion: The world is seldom black/white ... even if people are trying to convince you so. More often, the world is grey and depressing as a rainy day. 

Vejledning fra IT- og Telestyrelsen

I går eftermiddags offentliggjorde IT- og Telestyrelsen deres vejledningsmateriale for anvendelse af åbne dokumentformater i den offentlige sektor. Hvis du er journalist eller IT-ansvarlig i en offentligt myndighed, så er deres vejledning et "must-read" for dig - den er fuld af værdifuld information.

Se den på http://dokumentformater.oio.dk 

What is a conversion, really?

I have been part of some work for the the Danish National Telecom and IT Agency (IT- og Telestyrelsen). They have coordinated quite a few projects around the country to evaluate the usage of ODF and OOXML and possible problems with co-existance of the two document standards. The website for this work is at http://dokumentformater.oio.dk .

The basic setup for the projects and tests has been:

How does a particular department handle the two document formats and possible conversion between them?

Which problems will arise given their current software install-base?

Is it possible to provide some guidance to the departments regarding which specific features of a document format to avoid since they cause problems?

In other words it has been a rather pragmatic approach based on trying to answer the question: "Why do you experience the problems you see?"

Observations

The first thing we realized during the very first day was something quite crucial:

We were not testing compatibility between two formats - instead we were testing quality of converter-tools and compatibility between the specific format and the internal object model the format is loaded into.

Converter-tools

Both OOXML and ODF are rather immature document formats in the market today since neither of them has a broad market penetration as such. Despite the document count on Google, ODF is not widely used and most people still save their work in .DOC-files -even though they have Microsoft Office 2007 installed. This means that conversion between them is also rather immature and this affects the quality of the converters and the results of converting between one format and another. The ODF-Converter project has an extensive list of the differences between the formats themselves and also a list of features currently not supported by the converter and similar lists exist of features not supported by the other tools used. Luckily it seems that the quality of the converters are drastically improving for each incremental new release.

We also noted that a converter is not "just a converter". It lives and breathes on the application it is installed. This was of particular interest when looking at the ODF-Converter Office Add-In and the SUN OOXML-converter. They are both add-ons to existing Office applications but the application behaviour we saw was in principle the same when using OpenOffice.org, IBM Lotus Notes 8 or OpenOffice Novell Edition.

The problem lies in the fact, that every application has an internal object model that determines how a document is persisted in memory in the application. The binary format for Microsoft Office files were essentially a binary dump of the current memory in the application and this basically counts for at lot of applications with binary file formats. Anyway - regardless of how a document is "converted" or "transformed" using another application than the originator, at the end of the day it has to be loaded into the internal object model for the receiving application. This essentially means, that unless there is a 100% air-tight 1-1-mapping of the document format and the internal object model ... information will be lost. This was one part of the problem - the other was the sequence of conversion. Take a look at the sequence listed here:

Sequence 01  Sequence 02
   
load original format Load original format
 ↓
Convert format to new format Load original format into internal object model
 ↓
load new format into internal object model (make changes)
 ↓
(make changes) Persist as new document format
 
Persist as new document format  

It is not entirely evident that this will produce the same output, and we have seen no evidence that any of applications tested did actually have a 1-1 mapping between (any) document format and their internal object model. This also counts for Microsoft Office and its corresponding file types and OpenOffice itself. In short, this was a fact that we had to deal with in our tests.

On a funny note:

The conversion tools we used were all based on XSLT-transformation between the document formats. They are both XML-formats, so it is a good choice. However, we heard rumours that Novell would dump their OOXML-converter (based on XSLT) and develop their own converter based on the internal object model. It will be interesting to see, if it brings greater quality to the converters.

On a lighter note:

We saw in our tests that using the binary Microsoft Office file format as a middle-man when converting from OOXML to ODF (and back) actually produced the best results ... by a long shot. Having this step and using the binary Office file format as a type of "Lingua Franca", was more or less the key to "flaw-less conversion". If you stop and think about it, it makes perfect sense why we saw this. The Microsoft Office Binary file format is well established in the market (not thanks to Microsoft, but to reverse engineering) and the format has been arround for a long time. Basically, all applications can read it and all applications can write it. But why is this interesting? Well, OOXML is an XML-version of the binary Office file format, so since there are "no problems" with converting from the binary format to ODF, it should be technically relatively easy to convert from OOXML to ODF, since OOXML is a binary version of the binary file format.

It is just a matter of time ... and continious improvement of the format converters. 

Første kommentarer fra ISO/IEC

Det er pudsigt som man kan have ret. I forgårs (18. november 2007) kom de første rettelser fra ISO/IEC. De har udsendt svarende på i alt 19% af det samlede antal. Deres pressemeddelelse kan ses på ECMAs hjemmeside, hvor den kan studeres nærmere. I korte træk er ISO/IEC-editor Rex Jaeschke begyndt at sende de første rettelser ud. Det sker via et website, hvor de enkelte nationale standardiseringsråd har mulighed for at hente rettelserne til deres forslag. Jeg ved ikke, om fx det franske standardiseringsråd har adgang til rettelserne til de danske, men foreløbigt ser det ud til, at det bliver tilfældet. Indholdet på Hjemmesiden er kun tilgængeligt for de enkelte NB'er (nationale råd), men ifølge Microsofts Brian Jones skyldes det ISO-regler, hvor der specificeres, at kommentarer til forslag og svar på disse er et anliggende imellem ISO og de enkelte lande. I Danmark er Dansk Standard og arbejdet i det underlagt nogle begrænsninger i forhold til fortrolighed, så her skal man nok ikke vente for meget information til at begynde med - men så vidt jeg ved er det diametralt modsat i den engelske pendant til Dansk Standard, så måske skulle man holde øje med de informationer, der kommer ud derfra. Eller - deres regler er ret meget lig de danske, men de har valgt en anden tilgangsvinkel i forhold til arbejdet med OOXML, hvor de bla. har anvendt Wiki'er til at indsamle information, og de har også offentliggjort deres kommentarer til DIS 29500 ligesom vi har gjort i Danmark. Igen er valget af format faldet på en wiki.

Jeg har tidligere refereret til ordsproget "May you live in interesting times" ... og mon ikke de er på vej igen?

Smile

Det begynder at ligne noget

Aktiviteten i OOXML/ODF-sfæren er ved at gå op i et højere gear. Bemærkelsesværdigt nok har de fleste af Microsofts OOXML-resourcer været stort set fraværende på deres blogs siden afstemningen i september, men de er nu begyndt at titte frem igen. Man må håbe, at fraværet var været brugt konstruktivt på feedback til ECMA og Rex omkring kommentarerne til DIS 29500.

Alex Brown (ordstyrer til BRM-konferencen i Geneve i februar 2008) har postet et link til en FAQ, der opridser proceduren for afholdelsen af BRM-konferencen. Alex er i øvrigt medlem af den britiske udgave af Dansk Standard. Som Dansk Standard stemte de vist også Nej (med kommentarer) til DIS 29500. FAQ'en er relativt god læsning, så hvis man er interesseret i processen, er den klart læseværdig. Af specielt interessante områder er punkterne om, hvad der ikke bliver diskuteret på mødet samt, hvordan selve mødet vil blive styret. Der er også nogle kommentarer som hele "P-medlem/O-medlem"-diskussionen, som er blevet misforstået generelt over hle linien. Endelig er det beskrevet bestemmelserne for, om en delegation kan "mixe" antallet af deltagere. Der er et øvre loft på 120 mennesker til møderne, og det giver nogle praktiske udfordringer i forhold til antal delegerede fra hvert land.

Læs den ... du kunne jo blive klogere ...

Smile