a 'mooh' point

clearly an IBM drone

Do your math - OOXML and OMML (Updated 2008-02-12)

As I promised in my latest article about ODF and MathML, I have worked a bit with the ECMA-equivilants of ODF and MathML: OOXML and OMML (Office Math ML).

A bit of introduction is propably a good idea:

In OOXML, mathematical content is structured using the internal markup language, Office Math ML or OMML, for short notation. OMML is closely tied to the structure of WordProcessingML and the look-and-feel is very similar. In contrast to the ODF-way, OMML is usually inserted inline in the WordProcessingML whereas it in ODF is kept in a seperat part of the package. 

Ok - now that that is done with - lets get on with the good stuf!

As in my previous article, I'll work with the same  base equation



Now, as I wrote in the other article, learning MathML is like learning a new (programming)-language, and I can tell you, it is no different with OMML. MathML arranges the mathematical elements by position whereas OMML arranges the mathematical elements by their explicit meaning, so a fraction is created in MathML as (simplified)

<math:mfrac>
  <math:mi >
π</math:mi>
  <math:mn>4</math:mn>
</math:mfrac>

and in OMML it is created as (simplyfied)

<m:f>
  <m:num>
    <m:r>π</m:r>
  </m:num>
  <m:den>
    <m:r>4</m:t>
  </m:den>
</m:f>

So when dealing with MathML and e.g. fractions, we look at a fraction with "something at the top and something at the bottom". When dealing with OMML, we deal with "numerators" and "denominators". It is rather clear to me, that any skills learned in MathML are not directly applicable to OMML - and vice versa. It took me about the same amount of tíme to "get" MathML as it did to "get" OMML. In both cases, I had not worked with the specific ML before. It has taken me about a day to research and write each article.

Anyway - back to the plot.

As always I work with my friend, "the minimal OOXML-file". It is an OOXML-file stripped from all the junk and cut down to the bare minimum - not even a single, not-used namespace declaration is left behind. You can see the minimal file here: Minimal OOXML.docx (1,16 kb).

So my task was a two-step-task: Since OOXML is rather new there is not that much information about OMML out there. So as first step I created a sample equation using Word 2007 to get a feeling of what it's all about. Then I found Part 4 of the OOXML-spec, located section 7 and started to put the OMML together. The OMML I ended with was this:

<m:oMathPara>
  <m:oMath>
    <m:r>
      <w:rPr>
        <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
      </w:rPr>
      <m:t>cos</m:t>
    </m:r>
    <m:f>
      <m:num>
        <m:r>
          <w:rPr>
            <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
          </w:rPr>
          <m:t>π</m:t>
        </m:r>
      </m:num>
      <m:den>
        <m:r>
          <m:t>4</m:t>
        </m:r>
      </m:den>
    </m:f>
    <m:r>
      <m:t>=</m:t>
    </m:r>
    <m:f>
      <m:num>
        <m:rad>
          <m:radPr>
          </m:radPr>
          <m:deg/>
          <m:e>
            <m:r>
              <m:t>2</m:t>
            </m:r>
          </m:e>
        </m:rad>
      </m:num>
      <m:den>
        <m:r>
          <m:t>2</m:t>
        </m:r>
      </m:den>
    </m:f>
  </m:oMath>

I bet you are now thinking what I was thinking: what the f***? That's a lot of markup! Well, the reason why there is so much markup is that each piece of text/data in the equation is encapsulated in a "run"-element that enables additional styling. If all this additional markup including other property-markup is removed, the result is this:

<m:oMathPara>
  <m:oMath>
    cos
    <m:f>
      <m:num>π</m:num>
      <m:den>4</m:den>
    </m:f>
    =
    <m:f>
      <m:num>
        <m:rad>
          <m:e>2</m:e>
        </m:rad>
      </m:num>
      <m:den>2</m:den>
    </m:f>
  </m:oMath>
</m:oMathPara>

Ain't that purdy?

The OOXML-file with the equation is available here: minimal ooxml with math.docx (1,25 kb). It displays like this in Microsoft Office 2007:

Why not just use MathML?

Before I go into the details with converting from MathML to OMML, I think it is appropriate to pause and look at how MathML and OMML differ from each other. As I noted above there is quite a lot of "overhead" in OMML with everything being encapsulated in "runs". But there is a reason for this. The overhead enables us to do a couple of things that we cannot do with MathML.

Everything fits

You can put virtually everything into a OMML-formula that you can put into a normal WordprocessingML-fragment. As Murray Sargent puts it:

Word needs to allow users to embed arbitrary span-level material (basically anything you can put into a Word paragraph) in math zones and MathML is geared toward allowing only math in math zones. A subsidiary consideration is the desire to have an XML that corresponds closely to the internal format, aiding performance and offering readily achievable robustness. Since both MathML and OMML are XMLs, XSLTs can (and have) been created to convert one into the other. So it seems you can have your cake and eat it too. Thank you XML!

MathML allows some styling of the individual text fragments in the equations, but that's basically it.

WordprocessingML look-and-feel is preserved

To me it is really nice to work with markup for equations that is similar to the markup surrounding it. If I was to use MathML inline instead of OMML, the markup would be completely different than the markup around it. You can say that using MathML enables you to reuse any MathML-skills you might have in advance. Similarly you can say, that using OMML for equations enables you to reuse the skills you have from working with WordprocessingML. It's kind of a "give-and-take"-sitiation.

Revision-control (change-tracking) is possible

Having the overhead enables change-tracking on the same granular level as with your regular text. You can track changes in your equations on a character-by-character basis. In Word 2007 it looks like this when I make a modification to the equation (multiply the second fraction with "2" and remove the cosine-function from the first fraction).

 

 

The markup enabling this is here (for removing the cosine function, where "w:del" means "delete"):

<w:del w:id="0" w:author=" Jesper Lund Stocholm" w:date="2008-01-30T10:41:00Z">
  <m:r>
    <w:rPr>
      <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
    </w:rPr>
    <m:t>cos</m:t>
  </m:r>
</w:del>

This is not at all possible when using MathML out-of-the-box. You cannot merge the MathML with other markup like this, and if you use MathML as it is done in ODF (i.e. not "inline) it is simply impossible (at least as far as I can see). MathML in ODF is treated as an external object. which means that it is encapsulated in a OpenDocument Draw frame. The markup for one of the files I used in the other article is like this:

<text:p text:style-name="Standard">
 <draw:frame
   draw:style-name="fr1"
   draw:name="Objekt1"
   text:anchor-type="as-char"
   svg:width="2.418cm"
   svg:height="1.034cm"
   draw:z-index="0"
 >
  <draw:object
    xlink:href="./MathML"
    xlink:type="simple"
    xlink:show="embed"
    xlink:actuate="onLoad"
  />
  <draw:image
    xlink:href="./ObjectReplacements/MathML"
    xlink:type="simple"
    xlink:show="embed"
    xlink:actuate="onLoad"
  />
 </draw:frame>
</text:p>

If I wanted to change some text like "Display equation below"  to "Disrply equation below" (add an 'r' and delete an 'a') in ODT, it would look something like this:

<text:p>
  Dis<text:change-start text:change-id="ct102825880"/>
  r<text:change-end text:change-id="ct102825880"/>
  pl<text:change text:change-id="ct102844952"/>
  y equation below
</text:p>

So registration of the changes are - as with OOXML - merged into the text being modified. I think you could mark the whole equation as "modified" in ODF by putting an <text:change-start>-element around the complete <draw:object>-element, but I am not sure it would work. Also, OpenOffice.org doesn't seem to register changes to MathML-zones at all. Using OpenOffice.org it looks like this

 

(I changed the denominator of the first fraction to "54") 

 

I cannot say that there are (or are not) other areas where MathML just doesn't cut it - these were just a couple of those that I have experienced myself. I do believe, though, that the examples above warrant the simply question:

Why the hell did OASIS ODF TC decide to use MathML in the first place?

Interoperability

Interoperability is clearly what the young kids want these days - so let's see what we can do with mathematical content. MathML and OMML are clearly two different markup languages, but is it possible to convert between them? Fortunately it is. Microsoft Office 2007 allows c/p of MathML into OMML-equations and it can even export OMML to MathML. Luckily for us the logic around this is not embedded into some fancy place in Microsoft Office 2007 - it is done using simple XSLT-transformations. They have made the stylesheets OMML2MML.xsl and MML2OMML.xls and if you apply these to either your OMML or MathML, it is translated to the other. Just for the fun of it I tried to convert the OMML-version of the equation to MathML. All I did was to find the OMML2MML.XSL and insert a single line in the XML-file document.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml-stylesheet type="text/xsl" href="OMML2MML.XSL"?>
<w:document
  xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
  xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
  >
  <w:body>
    <w:p>
      <m:oMathPara>
        <m:oMath>
          <m:r>
            <w:rPr>
              <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
            </w:rPr>
            <m:t>cos</m:t>
          </m:r>
...

(and then I processed the file using my favorite XSLT-translator)

I'm sure - if you are a "technical" person - you have found yourself using/writing some code and just before you press "Compile" or "Run" you think: "This is sooo not gonna work". This was one of those situations for me - but you know what, it actually worked in the first try. The MathML generated is this

<?xml version="1.0" encoding="utf-8"?>
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">
  <mml:mi mathvariant="italic">cos</mml:mi>
  <mml:mfrac>
    <mml:mrow>
      <mml:mi>π</mml:mi>
    </mml:mrow>
    <mml:mrow>
      <mml:mn>4</mml:mn>
    </mml:mrow>
  </mml:mfrac>
  <mml:mo>=</mml:mo>
  <mml:mfrac>
    <mml:mrow>
      <mml:mroot>
        <mml:mrow>
          <mml:mn>2</mml:mn>
        </mml:mrow>
        <mml:mrow />
      </mml:mroot>
    </mml:mrow>
    <mml:mrow>
      <mml:mn>2</mml:mn>
    </mml:mrow>
  </mml:mfrac>
</mml:math>

... and it validates as well (using Amaya and changing the XML-file from a UTF-16 file to UTF-8)

Ét voilá

Now, wouldn't it be cool if the MathML generated from the OMML could be used in a ODT-document? You know what ... it can! I took the MathML above and inserted it into one of the documents I made for the ODF/MathML-article and inserted it into the MathML-zone of the ODF-package. The file is available here: minimal-mathml-omml-inject.odt (1,31 kb).

The result of opening the file using OpenOffice.org:

In the words of Murray Sargent, I guess you can have you cake and eat it too after all.

Smile

Update:

When writing my post about where to get help for ODF-development I suddenly remembered that I missed a part of this article: "The quirks". Because - naturally there are quirks with using OMML with Microsoft Office 2007 ... just as there were with MathML and OpenOffice.org.

Now, if you take another look at the OMML/XML-fragment I created, there were to parts I really couldn't figure out a way to remove:

<m:oMathPara>
  <m:oMath>
    <m:r>
      <w:rPr>
        <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
      </w:rPr>
      <m:t>cos</m:t>
    </m:r>
    <m:f>
      <m:num>
        <m:r>
          <w:rPr>
            <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
          </w:rPr>
          <m:t>π</m:t>
        </m:r>
      </m:num>

Now, the <w:rPr>-elements should have absolutely nothing to do with the content of <w:t>-element - or more correctly, the visibility of the text in the <w:t>-element should not depend of existance of an <w:rPr>-element. But if the two <w:rPr>-sections are omitted, the "cos"-text as well as the π-sign are not displayed. I really have no idea of why this is to so if you do, please let me know. Maybe one of the Microsoft Office 2007-Math guys could step in here?

Do your math - ODF and MathML

When I studied at DTU (Technical University of Denmark) I basically lived in the Department of Mathematics. I did my bachelor project there and I did my thesis there. I think it would be fair to say that math is really in my blood (or was).

Of course - in those days we wrote our equations in LaTeX (not the suit) and I remember how we laughed diabolically at our co-students that did their papers in e.g. Microsoft Word and had to use the really, really annoying "Equation Editor" (shudder). I remember how we also laughed at the students that did pictures and graphs in e.g. Adobe PhotoShop or Visio (before it was aquired by Microsoft, afaik), coz everybody knew that it had to be done using xFig ... the program with the worst possible UI ever ... at least in those days.

For the purpose of these articles (an article about Microsoft Office 2007 and OMML will follow shortly) I dug into my thesis and looked at how math was displayed using LaTeX. I created a "reference equation" to use when trying to display some math in either ODF or OOXML. The test equation I made was this:

\begin{equation}
    \cos\Big(\fraq{\pi}{4}\Big) = \Big(\fraq{\sqrt{2}}{2}\Big)
\end{equation}

For those of you not speaking LaTeX fluently - you should consult the "Not so short introduction to LaTeX" chapter 3 - or simply behold the equation below:

  

In ODF mathematical notations are done using MathML (section 12.5) - a W3C-standard for displaying mathematical content. The mathematical content is embedded in the ODF-package as an object and as far as I can see, it is not possible to use MathML inline in the content of the paragraphs of the document itself. I have earlier talked about ODF being vague and this is imo one of the places where some clarity could help.

But - learning MathML is like learning a new language ... it doesn't really make sense in the beginning. So I started to poke around a bit on the W3C-website in search of some tools or tutorials that would help me figure ot what MathML is all about. I eventually found a W3C tool called Amaya. It's a MathML/SVG-tool developed by W3C and I used this tool to create the MathML for the base equation above. In Amaya it looks like this:

 

 

The interesting part, of course, it the MathML created by Amaya. The MathML (slightly modified, but validated) is listed below

<?xml version="1.0" encoding="utf-8" ?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow>
    <mtext>cos</mtext>
    <mo>(</mo>
    <mfrac>
      <mi>&pi;</mi>
      <mn>4</mn>
    </mfrac>
    <mo>)</mo>
    <mi>=</mi>
    <mo>(</mo>
    <mfrac>
      <msqrt>
        <mn>2</mn>
      </msqrt>
      <mn>2</mn>
    </mfrac>
    <mo>)</mo>
  </mrow>
</math>

If you look at the XML, it is pretty easy to identify the different parts of the equation.

So - in theory I should be able to put this into an ODF-document and it would be displayed when opening the document using OpenOffice.org - the reference implementation of ODF. 

Let's see

Smile

Step 1

Create an ODF-document using OpenOffice.org with an mathematical formula embedded.

Now, this was the easy part. I cannot figure out how to insert a regular "Pi"-sign in the formula, but the formula looks just fine. The file is available here: math.odt (9,72 kb). It looks like this:

 


 

Step 2

Clean the file for all the disturbing crap that the application puts in per default

This was a bit more tricky, since somehow it seems that the mathical formula can only be contained in a file called "content.xml" - otherwise OpenOffice.org simply shuts down. Also, I have removed alle meta-data, styling, extra namespace-declarations, embedded thumbnails and graphical representation of the formula. The cut-down ODT-file is available here: math-minimal.odt (1,43 kb). The visual representation is completely like the original file. 

Step 3

Inspect the MathML in the application created MathML-file

The MathML created by OpenOffice.org looks like this: 

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE math:math PUBLIC "-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN" "math.dtd">
<math:math xmlns:math="http://www.w3.org/1998/Math/MathML">
  <math:sema
ntics>
    <math:mrow>
      <math:mi>cos</math:mi>
      <math:mrow>
        <math:mfenced math:open="" math:close="">
          <math:mfrac>
            <math:mi math:fontstyle="italic">pi</math:mi>
            <math:mn>4</math:mn>
          </math:mfrac>
        </math:mfenced>
        <math:mo math:stretchy="false">=</math:mo>
        <math:mfenced math:open="" math:close="">
          <math:mfrac>
            <math:msqrt>
              <math:mn>2</math:mn>
            </math:msqrt>
            <math:mn>2</math:mn>
          </math:mfrac>
        </math:mfenced>
      </math:mrow>
    </math:mrow>
    <math:annotation math:encoding="StarMath 5.0">cos left ( pi over 4 right )  = left (sqrt{2}  over 2  right )</math:annotation>
  </math:sema
ntics>
</math:math>

There are a couple of things to note about this. Firstly, I don't understand the namespace declaration as

"<!DOCTYPE math:math PUBLIC "-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN" "math.dtd">

The doctype should not matter at all - and why they chose to use a "DTD Modified W3C MathML 1.01" is beyond me. I'm not saying it's an error - I just don't get it. Enlighten me, pleze.  Secondly the MathML created looks different from the MathML created my Amaya. However - just as the same paragraph can be presented in all sorts of way using HTML and the same equation can be presented in different ways (e.g. sin2(x) + cos2(x) = 1 is basically the same as a2 + b2 = c2), the same equation can be created in an endless myriad of ways using MathML. Thirdly there are two distinct ways where the OOo MathML is different from the MathML of Amaya. Notice how it uses the <mfenced>-element to make a parenthesis instead of <mo>)</mo>. There is really no difference - however I tend to think that using the <mfenced>-element is slightly more sophisticated than the <mo>-element, but it's just a personal belief. Also, look at the usage of the <semantics> and <annotation>-elements. This is actually really cool. The <semantics>-elements are used to provide "meaning" to the MathML-markup and the content in the <annotation>-elements directly maps the MathML markup to the corresponding expression tree. Also, OpenOffice.org allows you to type in the annotation directly, thereby enabling some of the ease of writing LaTeX directly by hand.

Step 4

Validate the MathML-file using W3C-validator or Amaya 

The picture below shows the content.xml loaded and displayed in Amaya. The green dot in the bottom right corner indicates that the MathML is valid. I have also made a test with embedding the MathML in a HTML-document and validated it against the W3C-validator and the result is the same.

 

 

Super!

Step 5

Insert the MathML created by Amaya into the ODT-file and open the file using OpenOffice.org

Now, I have previously created the formula using Amaya and I just have to inject it into the ODT-file. I did and the file is available here: mathml-minimal-error.odt (1,23 kb). The result is, however, not as I expected

 


 

Ok - but as you might have noticed, all elements in the OOo MathML-file were namespace-prefixed, so maybe this will do the trick. I tried this as well but with the same result. File is available here: mathml-minimal-nsprefix-error.odt (1,24 kb).

Final step

Figure out what the hell is wrong 

I finally figured out what is wrong with the way OpenOffice.org handles MathML-content. It turns out that if I took the Amaya MathML (without ns-prefix) and inserted the MathML into the original content.xml-file but preserved the DOCTYPE-declaration, it works almost as expected. File is available here: mathml-minimal-inject-succes.odt (1,30 kb).



Well, some error are introduced. The Π-character is not displayed and the equation is displayed in bold. Also the equal-sign has disappeared as well.

Just for the fun of it I took the MathML-file generated by OpenOffice.org and removed the <semantics>-element as well as the <annotation>-element. File is available her: mathml-minimal-inject-no-semantics.odt (1,35 kb). The result when opening it in OpenOffice.org is .. well ... sad:



I have absolutely no idea of why it displays it like this. Removing the <semantics>-element and <annotation>-element should have no effect on the visual representation of the equation.

Conclusion?

Well, I don't really know what to conclude. Most of the things I have shown above are imo due to errors in the implementation of OpenOffice.org where MathML is clearly not implemented correctly sufficiently. It seems that there are some unwritten rules to how MathML is supposed to be used when working with it in OpenOffice.org, but they seem rather unclear and weird to me.

But how OpenOffice.org behaves is really not important to me - some implementations of ODF are better than others, and maybe other implementations do a better job at displaying MathML. The point should be how the specification says it should be used. Luckily the ODF-spec only talks about how MathML is used in a single place - section 12.5 Mathematical Content. It says that "Mathematical content is represented by MathML 2.0 (see [MathML])". The RelaxNG-snippet provided also tells us that you can put everything into a "math area", <math:math>:

<?xml version="1.0" encoding="UTF-8" ?>
<define name="math-math">
    <element name="math:math">
        <ref name="mathMarkup" />
    </element>
</define>
<!-- To avoid inclusion of the complete MathML schema, anything -->
<!-- is allowed within a math:math top-level element -->
<define name="mathMarkup">
    <zeroOrMore>
        <choice>
            <attribute>
                <anyName />
            </attribute>
            <text />
            <element>
                <anyName />
                <ref name="mathMarkup" />
            </element>
        </choice>
    </zeroOrMore>
</define>

So basically, all bets are off. I can only begin to wonder how other implementations of ODF use MathML.

And a small appetizer:

As soon as I get the time for it, I'll write an article as this one with Office 2007 and OMML. I will investigate how to markup mathematical content using OMML and I will also try to use the XSL-files provided by Microsoft in Office 2007 to create XSLT-translations of my base equation from OMML to MathML and vice versa.

... stay tuned ... 

Smile

Embrace and extend - SVG in ODF revisited

One of the attack-vectors on OOXML has been the lack of reuse of existing standards. Specifically it lands directly in the discussion of DrawingML vs. SVG and OOML vs. MathML ... both of which are relatively interesting subjects. The argument has been why Microsoft chose not to reuse SVG and created DrawingML instead - and likewise with MathML and OMML.

Now, some of the arguments for reusing existing standards are:

  • Reuse of other people's code
    As a programmer, I love this - there is nothing more satisfying than being able to reuse something that others have made an effort to produce
  • Increase quality
    If something is an existing standard, someone else has propably reviewed it and the worst bugs have likely been removed.
  • Brain cycle reuse
    If you reuse some work already defined, you will propably be able to find someone in your organization that has skills in this area - and you avoid the costs of re-educating them to use a new tool.

So, with respect to ODF, it has tried to reuse as many standards as possible, so e.g. mathematical content is done using MathML and vector graphics are supposedly done using SVG. Microsoft has chosen a different path where they have created new formats for their formats, så mathematical content is done using OMML (Office Math Markup Language) and vector graphics are done using DrawingML.

A couple of weeks ago I heard some rumours that ODF had not actually only used SVG as vector graphics format but also even extended it beyond the standardized format. My initial response was that it had to be wrong information. One of the corner stones of ODF is namely that it reuses existing standards and that there is a "clean cut" between ODF and the standard it utilizes. This way I would be able to buy/aquire some library that supports SVG and simply incorporate it in my product implementing ODF. But if the referenced standard is extended - I will either experience less functionality due to extensions not being parts of the standard or I could experience crashing code when I try to pass the extended format to the external library - at least if it performs e.g. DTD/schema validation and finds out that invalid elements are present in the input.

So what did I do?

Basically I started by doing a random text-search in the ODF-spec for occurences of "[SVG]". One of the first things that caught my attention was the paragraph in section 1.3 Namespaces, Table 2 where it says:

Prefix Description
Namespace
svg
For elements and attributes that are compatible to elements or attributes defined in [SVG].
urn:oasis:names:tc:opendocument:xmlns: svg-compatible:1.0


The term "compatible to elements or attributes" seems quite odd to me, since it should not be necessary to specify this if the referenced standard is not extended. I did another quick search and I stumpled over these sections of the specification:

  • 14.14.2 SVG Gradients
  • 15.13.13 Line Join

Let me quickly walk through the contents of each section.

14.14.2 SVG Gradients

The contents of section 14.14.2 says, amongst other things.

In addition to the gradients specified in section 14.14.1, gradient may be defined by the SVG gradient elements <linarGradient> and <radialGradient> as specified in §13.2 of [SVG].

Cool!

Now, the section goes on as

The following rules apply to SVG gradients if they are used in documents in OpenDocument format:

  • The gradients must get a name. It is specified by the draw:name attribute.
  • For <linarGradient>, only the attributes gradientTransform, x1, y1, x2, y2 and spreadMethod will be evaluated.
  • For <radialGradient>, only the attributes gradientTransform, cx, cy, r, fx, fy and spreadMethod will be evaluated.
  • The gradient will be calculated like having a gradientUnits of objectBoundingBox, regardless what the actual value of the attribute is.
  • The only child element that is evaluated is <stop>.
  • For <stop>, only the attributes offset, stop-color and stop-opacity will be evaluated.

 So, to be able to determine if ODF is only referencing SVG, we need to look at section 13.2 in SVG spec. It says:

<!ELEMENT %SVG.linearGradient.qname; %SVG.linearGradient.content; >
<!-- end of SVG.linearGradient.element -->]]>
<!ENTITY % SVG.linearGradient.attlist "INCLUDE" >
<![%SVG.linearGradient.attlist;[
<!ATTLIST %SVG.linearGradient.qname;
    %SVG.Core.attrib;
    %SVG.Style.attrib;
    %SVG.Color.attrib;
    %SVG.Gradient.attrib;
    %SVG.XLink.attrib;
    %SVG.External.attrib;
    x1 %Coordinate.datatype; #IMPLIED
    y1 %Coordinate.datatype; #IMPLIED
    x2 %Coordinate.datatype; #IMPLIED
    y2 %Coordinate.datatype; #IMPLIED
    gradientUnits ( userSpaceOnUse | objectBoundingBox ) #IMPLIED
    gradientTransform %TransformList.datatype; #IMPLIED
    spreadMethod ( pad | reflect | repeat ) #IMPLIED  
>

So it seems that at least the attribute gradientUnits is not used in the ODF-adapted version of SVG.

If we look at <radialGradient>, we need to cross reference with the corresponding  DTD in SVG. It says:

<!ENTITY % SVG.radialGradient.extra.content "" >
<!ENTITY % SVG.radialGradient.element "INCLUDE" >
<![%SVG.radialGradient.element;[
<!ENTITY % SVG.radialGradient.content
    "(( %SVG.Description.class; )*, ( %SVG.stop.qname; | %SVG.animate.qname;
    | %SVG.set.qname; | %SVG.animateTransform.qname;
    %SVG.radialGradient.extra.content; )*)"
>
<!ELEMENT %SVG.radialGradient.qname; %SVG.radialGradient.content; >
<!-- end of SVG.radialGradient.element -->]]>
<!ENTITY % SVG.radialGradient.attlist "INCLUDE" >
<![%SVG.radialGradient.attlist;[
<!ATTLIST %SVG.radialGradient.qname;
    %SVG.Core.attrib;
    %SVG.Style.attrib;
    %SVG.Color.attrib;
    %SVG.Gradient.attrib;
    %SVG.XLink.attrib;
    %SVG.External.attrib;
    cx %Coordinate.datatype; #IMPLIED
    cy %Coordinate.datatype; #IMPLIED
    r %Length.datatype; #IMPLIED
    fx %Coordinate.datatype; #IMPLIED
    fy %Coordinate.datatype; #IMPLIED
    gradientUnits ( userSpaceOnUse | objectBoundingBox ) #IMPLIED
    gradientTransform %TransformList.datatype; #IMPLIED
    spreadMethod ( pad | reflect | repeat ) #IMPLIED
>

So here the attribute gradientUnits is not used as well. 

But luckily the good guys at ODF TC have solved this mystery for us - since they have decided that the value of the (non-existing) attribute gradientUnits is calculated as having a value of "objectBoundingBox", regardless of the value passed as this parameter. It's a bit odd, but I suppose it has something to do with the way the SVG-fragments positions themselves around the other objects in the document.

15.13.12 Line Join

The contents of section 15.13.13 is:

The attribute draw:stroke-linejoin specifies the shape at the corners of paths or other vector shapes, when they are stroked. The values are the same as for [SVG]'s strokelinejoin attribute, except that the attribute in addition to the values supported by SVG may have the value middle, which means that the mean value between the joints is used.

They have even been so kind to provide us with a schema fragment defining the possible usage of this feature in ODF:

<define name="style-graphic-properties-attlist" combine="interleave">
    <optional>
        <attribute name="draw:stroke-linejoin">
            <choice>
                <value>miter</value>
                <value>round</value>
                <value>bevel</value>
                <value>middle</value>
                <value>none</value>
                <value>inherit</value>
            </choice>
        </attribute>
    </optional>
</define>

Compare this with the DTD of SVG (Appendix A.1.7 Paint Attribute Model):

<!ENTITY % SVG.stroke-linejoin.attrib
    "stroke-linejoin ( miter | round | bevel | inherit ) #IMPLIED"
>

So the attribute value "middle"  is indeed an addition to SVG.

Conclusion 

You might be wondering if all this is really worth an entire article about a couple of additions/exclusions of SVG, and you kindda have a point. However, the devil lies in the details.

The modifications to SVG (even if they are minor) are bad enough as they are, because they basically kill high-fidelity interoperability when using existing SVG-libraries. When you are limiting the usage of some component (the limitations to the values of gradientUnits) you basically loose control with how existing data behaves. And when you enlarge a standard (addition of the middle-attribute of the stroke-linejoin element) you loose control with how your own data behaves when using it in other scenarios. You know, this is exactly what Microsoft did when they enlarged not only CSS but JavaScript. Maybe the memory of the ODF-founders is not that great, but I certainly remember the loads of crap-work we had to do in the late ninetees when creating web-pages to "IE5-compatible" browsers and "the rest". In fact - this nightmare still haunts us with the Microsoft additions to JavaScript.  Maybe they just thought: "If Microsoft pulled it off, so can we". I think that's a bad choice.

Also, you should note that ODF does not use SVG "as such" at all. They use fragments of SVG, i.e. elements with same names and attributes and then they fit it into the overall architecture of ODF. This is hardly "just referencing". As the paragraph says above (stroke-linejoin), the elements specifying this are not SVG-elements. They are similar to SVG-elements and even extended beyond this. I actually find it really hard to see or understand how the ODF TC can claim - with a straight face - that ODF only references SVG. I suppose that if I made my own JLSMarkup for document formats and used an element called <body> I would also be able to claim that I was reusing W3C xHTML 1.0. I just don't find it the right thing to do.

My only surprise is why this has not surfaced until now and how anyone can sit down and read in ODF (as being both pro-ODF or pro-choice) and not be just a little confused about how they could claim "just referencing existing standards", is a bit mind-baffling to me. I suppose ECMA could do the same with OOXML and claim "reusage of HTML DOM in OOXML-architecture" since a WordProcessingML-document contains both a <body>-element as well as a <p>-element.

Post scriptum

On his blog Brian Jones speculated in his last comment on the thread "Why all the secrecy?" if you could take an existing SVG-drawing, put it into an ODF-document and expect it to work. Well, just as OOXML, ODF has no limitations to what kind of data you might want to put into it, so usage of SVG in a ODF-document is indeed possible from a technical/architectural point of view. It is not a format question but an implementation-specific question. However - will it work?

ODF has several ways to embed data into the document. The two relevant means are inclusion of an SVG-drawing as an image and inclusion of an SVG-image as an object. ODF supports two ways to embed an object, as stipulated in section 9.3.3:

A document in OpenDocument format can contain two types of objects, as follows:

  1. Objects that have an OpenDocument representation. These objects are:'
    1. Formulas (represented as [MathML])
    2. Charts
    3. Spreadsheets
    4. Text documents
    5. Drawings
    6. Presentations
  2. Objects that do not have an XML representation. These objects only have a binary Representation, An example for this kind of objects OLE objects (see [OLE]).

 

Well, SVG is clearly XML but it is not an "OpenDocument representation" - but then again, neither is MathML, so I'll opt for using these two methods when trying to embed an SVG-drawing into a ODT-document:

  • Insert the SVG-drawing as an image
  • Insert the SVG-drawing as an XML part using the <draw:object>-element as specified in section 9.3.3 of the ODF spec.

I'll use the latest and greatest release of OOo, OpenOffice 2.3.1 DA, to try to display the files. You can see the SVG-file here: ex.svg (482,00 bytes)

Insert SVG as an image

I have created a small ODT-document and added the SVG-file to it. I have added an SVG-image to content.xml as a regular image and put the SVG-file in a folder by itself. The XML-file content.xml is displayed here below.

<?xml version="1.0" encoding="UTF-8" ?>
<office:document-content
 xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
 xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
 xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
 xmlns:xlink="http://www.w3.org/1999/xlink"
 xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
>
 <office:body>
  <office:text>
   <text:p >Test of insertion of SVG-image in ODT-document</text:p>
   <text:p >
    <draw:frame
     draw:style-name="fr1"
     draw:name="grafik1"
     text:anchor-type="paragraph"
     svg:width="17cm"
     svg:height="13cm"
     draw:z-index="0">
       <draw:image
      xlink:href="SVG/ex.svg"
      xlink:type="simple"
      xlink:show="embed"
      xlink:actuate="onLoad" />
      </draw:frame>
   </text:p>
  </office:text>
 </office:body>
</office:document-content>

As it is seen the SVG-image is simply added as a regular image using the ODF-modified version of SVG. The ODT-file is available here: test svg image.odt (1,48 kb). Anyone want to take a guess on what the result of opening this file will be?

 

 

Insert SVG as an "XML-object"

As noted above ODF allows insertion of objects with an "XML-representation" as just a text file. The construction of the ODF-package is a bit more complicated and I'd be happy if anyone could tell me if I made a mistake - and what the correct way would be. As basis for my file I have used an ODT-file with a formula in MathML embedded, an so I'll just again show the contents of the content.xml-file here below.

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
  xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
  xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
  xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
  office:version="1.0">
  <office:body>
    <office:text>
      <text:p >Test of insertion of SVG in OOo</text:p>
      <text:p >
        <draw:frame
          draw:name="My SVG drawing [JLS]"
          text:anchor-type="as-char"
          svg:width="1.011cm"
          svg:height="0.467cm"
          draw:z-index="0"
        >
          <draw:object
            xlink:href="./SVG"
            xlink:type="simple"
            xlink:show="embed"
            xlink:actuate="onLoad"
           />
        </draw:frame>
      </text:p>
    </office:text>
  </office:body>
</office:document-content>

Again an xlink reference to the SVG-file is "simply" added to content.xml. The ODT-file is available here: test insert svg.odt (1,48 kb). Anyone want to take a guess on what the result of opening this file will be?

 

 

 

So it seems to recognize the SVG filetype - it just doesn't understand how to process it.

I have a feeling that I might have made an error in the manifest, so I'll include it here and hopefully someone can pinpoint if there is an error:

<?xml version="1.0" encoding="UTF-8"?>
<manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0">
 <manifest:file-entry manifest:media-type="application/vnd.oasis.opendocument.text" manifest:full-path="/"/>
 <manifest:file-entry manifest:media-type="image/svg+xml" manifest:full-path="SVG/ex.svg"/>
 <manifest:file-entry manifest:media-type="application/vnd.oasis.opendocument.image" manifest:full-path="SVG/"/>
</manifest:manifest>

OOo and SVG

I have said before that the devil lies in the details - but here it actually lies right up-front. You see - OpenOffice.org does presently (version 2.3.1) not suppport SVG. It doesn't support SVG as regular images and it does not support SVG as providing vector graphics or "line art". You can import SVG-images with OOo, but it is converted to OpenDocument Draw and Open Document Draw data can be exported to SVG. The import/export is not done not using OOo itself but with a filter, that converts the SVG into the internal ODF Draw format. The feature of supporting SVG is apparently the single most requested feature in OOo, so maybe it will soon be a part of OOo. Also take a look at the "General note" on the "Unsuppoted SVG features"-page of the filter:

SVG and what's named SVG-compatible in OpenDocument is really different. Therefore, the import filter can only approximate the SVG contents.

Ooh - and incidentally - the way ODF and OOo handles SVG is exactly the same way OOXML and Microsoft Office 2007 handles MathML.

Smile

ECMA har udsendt de sidste svar

I går var så dagen, hvor de sidste svar fra ECMA blev gjort tilgængelige for de nationale råd rundt omkring i verden. Dermed har ECMA svaret på alle godt og vel 3500 kommentarer, der indløb i løbet af behandlingen af DIS 29500 i sommer/efterår 2007.

Under arbejdet med standarden og diskussionerne om den henover sommeren kunne jeg ikke lade være med at tænke på, at rigtigt mange af kommentarene var det rene vås eller i bedste fald ligegyldige. De var som lavet ud fra devisen "hvor jeg nu bevidst prøver at misforstå det - hvor er det så lettest henne?" (ex: OLE). Det er klart, at der var mange gode kommentarer, men mange af dem var faktuelt noget ævl.

Men jeg må erkende, når jeg nu sidder og kigger på resultatet af behandlingen af kommentarene, at den samlede mængde kommentarer har resulteret i en standard, der på mange måder er bedre end den var før. Standarden er helt enkelt blevet mere præcist formuleret og generelt lettere at anvende. Det er helt klart et anerkendende nik værd overfor alle de mennesker, der (om de er for- eller imod OOXML) har gennemtrævlet forslaget til standard. Tak til jer! Det er værd at understrege, at standarden ikke er blevet lavet totalt om - den er derimod blevet forbedret på en række områder, hvor den trængte til finpudsning. Selve arkitekturen er den samme, dvs den energi man skulle have brugt på at anvende den eksisterende ECMA-376 er bestemt ikke spildt. Af de punkter, hvor jeg synes de største forbedringer er kommet, er:

  • Der er ikke længere noget krav om at skulle anvende VML i nye dokumenter
  • Angivelse af landekoder skal nu ske som specificeret i RFC-4646
  • Det er mere tydeligt, at OOXML skal anvende eksisterende, velafprøvede hash-koder som bla. specificeret ved FIPS-180
  • Conformance-kravene er blevet mere tydelige
  • Den berømte "leap year bug" er nu markeret som forældet
  • Det er muligt at anvende datoer før 1900
  • Formel-specifikationerne for regneark er nu beskrevet i EBNF-notation

Og hvad så med resten af de mange kommentarer som fx "Compatibility-elements? Tja - nu nævnte jeg blot de dele, som jeg synes er de vigtigste (og så har jeg naturligvis sikkert glemt nogle andre vigtige).

Smile

Endnu en spand svar fra ECMA

Så er ECMA klar med endnu en spand svar til de forskellige lande i forbindelse med arbejdet omkring DIS 29500. Af deres pressemeddelelse kan det ses, at ECMA nu har svaret på 92% af de indkomne 3500 kommentarer, og det ser ud til, at det lykkes for dem at nå alle svar inden deadline på mandag d. 14. januar 2008. Af svarene på de danske kommentarer mangler nu kun ganske få at blive behandlet og det bliver spændende at se, hvad ECMA svarer på de sidste punkter.

Én ting jeg har haft svært ved at hitte ud af er, om ECMA får lov af ISO/IEC til at offentliggøre en ny samlet standard med alle rettelser indkluderet. Er der nogle af jer læsere, der har denne information? Så vidt jeg læser JTC1-direktiverne, så må de ikke offentliggøre de enkelte dispositioner i sig selv og heller ikke kommentarerne fra landene, så den eneste mulighed for at få svarene på kommentarene ud er vel at offentliggøre den endelige, fulde rapport. Jeg tror personligt ikke, at ECMA vil offentliggøre den fulde, reviderede, standard før efter BRM i februar - men uanset udfaldet er det jo lidt et valg imellem kolera og pest. Jeg skal være ærlig at indrømme, at jeg har nydt arbejdsroen i de sidste måneder efter 2. september 2007 og specielt efter ECMA begyndte at rundsende svarene til de enkelte lande. Det er klart, at der ikke har været så meget debat - faktisk meget mindre end jeg havde troet - men det er jo også en helt anden situation de enkelte lande står i. I første del af den 5 måneder lange ballot period var det i mine øjne en klar fordel, at OOXML blev diskuteret så bredt, for det fik afdækket en lang række mangler og uhensigtsmæssigheder ved standarden. Jeg tvivler på, at de enkelte lande have kunnet levere samme arbejde, hvis det ikke havde været for GrokDoc, IBM, Andy og andre, der har gennemtrævlet OOXML-spec for fejl. Der var en overhængende risiko for, at landene blot havde stemt "abstain" fordi de ikke kunne forstå spec - ganske som de gjorde med ODF i december 2006. Det gjorde de jo heldigvis ikke, og situationen nu er jo, at de enkelte lande skal se, om svarene fra ECMA til de enkelte kommentarer er god nok. Det er naturligvis et arbejde af en helt anden karakter, og det er min opfattelse, at vi her ikke har brug for nøglepersonerne fra den anden side af floden.

Men - det bliver spændende at se det endelige resultat af ECMA TC45s arbejde. 

Smile