74169 – support Apple pages format

Bug 74169 - support Apple pages format

Summary: support Apple pages format

Status:	RESOLVED FIXED

Alias:	None

Product:	Document Liberation Project
Classification:	Unclassified
Component:	libetonyek (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	All All

Importance:	low enhancement
Assignee:	David Tardon

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	40090
	Show dependency tree / graph

Reported:	2014-01-29 04:15 UTC by Paul Wise
Modified:	2017-10-30 11:25 UTC (History)
CC List:	4 users (show)

See Also:
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Paul Wise 2014-01-29 04:15:28 UTC

A Windows user friend of mine recently asked me to convert an Apple Pages document to word format because their Apple user friend had sent it in a Apple Pages format, which nothing outside of Apple/Google can view. Luckily the format includes an embedded PDF file, which worked for the use-case, but it would be nice if LibreOffice could natively open Apple Pages format and save it as other formats.

There is a blog post here about the format:

http://xorglog.blogspot.com/2009/05/how-to-edit-mac-os-pages-documents-in.html

Basically this is the list of files inside:

buildVersionHistory.plist
index.xml (document)
QuickLook
QuickLook/Thumbnail.jpg (JPEG rendering of document)
QuickLook/Preview.pdf (PDF rendering of document)
thumbs
thumbs/PageCapThumbV2-1.tiff (weird TIFF image, not document)

There are some samples hidden behind a login (use bugmenot.com) here:

https://www.stocklayouts.com/Templates/Free-Templates/Free-Sample-Apple-iWork-Pages-Template-Design.aspx

Probably some more samples here:

http://www.brighthub.com/computing/mac-platform/articles/109380.aspx

Comment 1 Tomaz Vajngerl 2014-01-29 10:35:58 UTC

Yes, this is something really needed as there is almost no program available that can read/convert this format in Windows / Linux platforms. 

There is already support for rudimentary reading of Keynote documents with libetonyek library so it might make sense (assuming the formats have similar structure) to either extend this library or make a new one using libetonyek as base. 

As we generally need to reverse engineer the format it would be helpful to attach simple example documents made with iWork Pages for those who want to look at the format and don't have access to OSX (for example a simple document with one or two paragraphs, a table and a picture would be a good start).

Comment 2 David Tardon 2014-01-29 13:18:05 UTC

libetonyek master has already got BIPU support for Pages (and Numbers too, but it just detects the format). The filter is on the GSoC ideas list.

Comment 3 Paul Wise 2014-01-30 08:05:50 UTC

I don't have access to OSX so I can't attach any documents but you can find many samples on the Internet, I linked to some in the initial report.

The format is pretty simple and XML so I wouldn't say reverse engineering would be needed.

There is a project to convert Apple Pages to epub here:

https://github.com/immateriel/pages2epub/

There is a project to modify Keynote files here:

https://github.com/undees/snippetize

Comment 4 David Tardon 2014-01-30 09:22:56 UTC

(In reply to comment #3)
> The format is pretty simple

You only think that because you have never seen a document with complex layout, charts, tables and other objects in it.

> and XML so I wouldn't say reverse engineering
> would be needed.

You would be wrong. That the format is XML does not mean it is immediately obvious what the elements/attributes and their values mean. And how they work together. E.g., the parameters for parametrized shapes (stars, arrows, quote bubbles, etc.) are saved as opaque numeric values.

Or take tables. Most of the table-related elements and attributes are 1-3 letter abbreviations. What is even "better": only non-empty cells are actually saved. The position of the next filled cell (i.e., row and column) is saved, inventively, using a single attribute. (Of course, first you need to discover that this attribute--which is called sf:ct--has anything to do with the placement of cells in the grid...)

Comment 5 tommy27 2014-07-22 07:22:19 UTC

is this a duplicate of Bug 35361?
could anybody double check?

Comment 6 David Tardon 2014-07-22 10:58:15 UTC

(In reply to comment #5)
> is this a duplicate of Bug 35361?

No, it is not.

Comment 7 David Tardon 2015-05-11 07:38:12 UTC

An initial support for Pages <= 4 has been added.