53878 – FILESAVE, EasyHack: Size of XML files can been reduced by using default namespaces on certain elements

Bug 53878 - FILESAVE, EasyHack: Size of XML files can been reduced by using default namespaces on certain elements

Summary: FILESAVE, EasyHack: Size of XML files can been reduced by using default names...

Status:	NEW

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	LibreOffice (show other bugs)
Version: (earliest affected)	3.6.0.4 release
Hardware:	All All

Importance:	lowest enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:	difficultyMedium, needsDevEval

Depends on:	53998
Blocks:	ODF-Flat
	Show dependency tree / graph

Reported:	2012-08-21 12:58 UTC by Nicholas Shanks
Modified:	2020-03-14 12:38 UTC (History)
CC List:	2 users (show)

See Also:
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Nicholas Shanks 2012-08-21 12:58:16 UTC

While creating my own flat spreadsheet file, based on LibreOffice and Google Spreadsheet saved files, I realised it would save a lot of bytes to add a default namespace to the <table:table> element, and remove all the table: prefixes from this element and it's children.

i.e. go from this:

<table:table table:name="" table:style-name="">
 <table:table-column table:style-name="" table:default-cell-style-name="" table:number-rows-repeated="" table:number-columns-repeated="">
 <table:table-row table:style-name="">
  <table:table-cell table:style-name=""/>
 </table:table-row>
</table:table>

to this:

<table name="" style-name="" xmlns="urn:oasis:names:tc:opendocument:xmlns:table:1.0">
 <table-column style-name="" default-cell-style-name="" number-rows-repeated="" number-columns-repeated="">
 <table-row style-name="">
  <table-cell style-name=""/>
 </table-row>
</table>

Such optimisations are surely also available to other document types, not just spreadsheets.
I also set the style namespace as the default namespace on the office:styles, office:automatic-styles and office-master-styles elements, followed by removing style: prefixes from element names and attributes. I brought down a simple (14 cols, 15 rows) 29KB file to 24KB. Since I will be generating and serving these over the internet for our business application, this 17% reduction (before compression) is fairly significant.


For bonus credit, since the office: and text: namespaces are necessary on all and most table cells respectively, reducing the namespace identifiers to "o" and "t" would further bring down the file size.

Comment 1 Nicholas Shanks 2012-08-24 09:02:47 UTC

Not such a good idea after all.

Microsoft Excel cannot read documents with default namespaces (and maybe even documents where the namespace is not the normal string). I suspect they are not using a real XML parser :-(


The error "Excel found unreadable content in 'filename'. Do you want to recover the contents of this workbook?" is shown.
Accepting the offer to recover the document's contents strips out all formatting and styles.

I think Excel compatability should be a higher priority than file size.

Perhaps there could be two FILESAVE codepaths, a 'pure' one for adhering to the specs and writing sexy XML, and another for emitting 'MS Office compatable' OpenDocument files (bug #53998 filed requesting such).
This feature request would then apply to the sexy code path only.

Comment 2 Joel Madero 2014-02-27 22:57:59 UTC

In order to limit the confusion between ProposedEasyHack and EasyHack and to make queries much easier we are changing ProposedEasyHack to NeedsDevEval.

Thank you and apologies for the noise

Comment 3 Robinson Tryon (qubit) 2014-12-22 07:20:28 UTC

(In reply to Nicholas Shanks from comment #1)
> Microsoft Excel cannot read documents with default namespaces (and maybe
> even documents where the namespace is not the normal string). I suspect they
> are not using a real XML parser :-(

Well that's unfortunate!

> I think Excel compatability should be a higher priority than file size.
> 

True, but I think the devs are going to be reticent to maintain "two FILESAVE codepaths". Some alternative ideas:

1) Perhaps the most recent version of MS-Office uses a real XML parser.
Knock on wood!

2) Microsoft attends ODF Plugfests (e.g. http://plugfest.opendocumentformat.org/2014-london/), so concerns such as this one could definitely be raised.

Let's test again w/MS-Office, and see where we stand. In any case, it's a neat idea for a space-saving enhancement. Let's change Status -> NEW

Comment 4 Robinson Tryon (qubit) 2014-12-22 07:23:03 UTC

(In reply to Nicholas Shanks from comment #1)
> Microsoft Excel cannot read documents with default namespaces (and maybe
> even documents where the namespace is not the normal string). I suspect they
> are not using a real XML parser :-(

Could you please upload a couple sample files? It would be great to have the "original" file (without default namespace) alongside the file that uses a default namespace, so we can check them side-by-side.

Thanks!

Comment 5 Robinson Tryon (qubit) 2015-12-14 06:11:43 UTC Comment hidden (obsolete)

Migrating Whiteboard tags to Keywords: (needsDevEval difficultyBeginner)
[NinjaEdit]

Comment 6 Nicholas Shanks 2016-06-16 08:30:06 UTC

Sorry for the delay on this one — I left the relevant company, no longer need to use/output OpenOffice files, and have been ignoring related emails. If you want to send me any current XML file I'll perform some manual optimisation of the XML and send it back. It's nothing that couldn't be done by any XML-savvy coder though.