Bug 77278 - Rewrite old Pocket Word (PWI) file format import filter.
Summary: Rewrite old Pocket Word (PWI) file format import filter.
Status: RESOLVED FIXED
Alias: None
Product: Document Liberation Project
Classification: Unclassified
Component: General (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: Other All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 109095 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-04-10 17:10 UTC by kolubat
Modified: 2021-03-23 12:34 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
some psw files found on bz.appache.org, ... (237 bytes, text/plain)
2018-06-05 09:25 UTC, osnola
Details
some .pwi files (1.79 KB, application/x-pocket-word)
2020-02-20 05:13 UTC, Mirppc
Details
more .pwi files (2.36 KB, application/x-pocket-word)
2020-02-20 05:15 UTC, Mirppc
Details
TAR of .pwi files (195.59 KB, application/x-7z-compressed)
2020-02-20 05:22 UTC, Mirppc
Details
psw found on bz.appache.org, ... (74.72 KB, application/zip)
2020-04-12 09:51 UTC, osnola
Details

Note You need to log in before you can comment on or make changes to this bug.
Description kolubat 2014-04-10 17:10:11 UTC
PWI file format import for word processing does not currently exist. PWI is a file format used by Word for Windows CE, and can contain mixed handwriting and text. (Note that there are several other file formats listed at Microsoft's website: http://support.microsoft.com/kb/884182 but I am not reporting them separately because I am not familiar with them.)
Comment 1 David Tardon 2014-04-17 13:12:56 UTC
Pocket Word format used to be supported by LibreOffice, but the support was dropped in 4.0, IIRC. The code was in xmerge/source/pocketword.
Comment 2 Owen Genat (retired) 2014-07-18 13:50:23 UTC
(In reply to comment #1)
> Pocket Word format used to be supported by LibreOffice, but the support was
> dropped in 4.0, IIRC. The code was in xmerge/source/pocketword.

Removed by http://cgit.freedesktop.org/libreoffice/core/commit/?id=a5783fe922b2419b5b662eb5f544a1f401341dbf

As per comment 1, confirmed. Status set to NEW. Summary edited for clarity. David, if you are not happy with this, please feel free to set status accordingly, as you (or Fridrich / Laurent) know what the true likelihood of writing an import filter for this format will be.
Comment 3 David Tardon 2014-07-18 15:44:28 UTC
I can tell you what is the likelihood I am going to write the filter :-) But if anyone wants to work on it, sure, why not...
Comment 4 David Tardon 2018-01-22 12:40:49 UTC
*** Bug 109095 has been marked as a duplicate of this bug. ***
Comment 5 osnola 2018-06-05 09:18:18 UTC
Hello,
in order to rewrite this filter, some .pwi files will be needed: ie. even if it is probably possible to read the old code to create a new filter, if we can not test it on some files, the results will probably be catastrophic....
Comment 6 osnola 2018-06-05 09:25:15 UTC
Created attachment 142531 [details]
some psw files found on bz.appache.org, ...
Comment 7 Urmas 2018-06-08 00:53:32 UTC
Here's some code to dump the file contents:

https://pastebin.com/A6EZuSYT
Comment 8 official.mohammadshahin 2019-09-18 06:01:03 UTC Comment hidden (spam)
Comment 9 Mirppc 2020-02-20 05:13:31 UTC
Created attachment 158019 [details]
some .pwi files

I know this is a old bump but if you need some PWI files from an old pocket word instance i had, i have a few i could upload from an old Jornada i used in 2006.  I dont know the contents as it has been just so long.
Comment 10 Mirppc 2020-02-20 05:15:47 UTC
Created attachment 158020 [details]
more .pwi files

What ponderious titles these have.  i have a slew, if i find someone from the Document Foundation at SCALE 18x once more i will inquire about that upload site that use to exist for odd and unsual files that Libreoffice could not open.
Comment 11 Mirppc 2020-02-20 05:22:17 UTC
Created attachment 158021 [details]
TAR of .pwi files

I am such a dork... i know how to archive!  Here is every .pwi i had on that card.
Comment 12 osnola 2020-04-12 09:51:52 UTC
Created attachment 159510 [details]
psw found on bz.appache.org, ...

Oops, the previous attachment was bad, so here are the files that I found on bz.appache.org, ...
Comment 13 osnola 2020-04-17 08:52:13 UTC
The old filter is quite basic: it reads the font names, then uses some heuristics to retrieve paragraphs of text, retrieving:
- the main properties of the characters, with the notable exception of the superscript and the subscript…
- the properties of the following paragraph: first indent, left/right margin, left/center/.. alignment, a flag to know if it is a bulleted list,
I have rewritten a « more robust » version of this code in libwps, but clearly, there are many things that are not recovered (as I can not guess
what there means).

If you want to try it, I have updated the libwps version compiled with emscripten: http://libwps.sourceforge.net/convertWPS.html .

To improve this filter, it would be useful to have some Pocket Word files (and their pdfs equivalent) that :
- [character properties] use exponents and subscripts,
- [paragraph properties] have paragraphs with single/double/double line spacing/..., with a certain spacing before/after the paragraph, lines with fixed height, different types of listings
- [general] contains header(s), footer(s), footnote(s), endnotes, comment(s), image(s), table(s)…
and simple documents with different page sizes, different margins, some metadata...
- …
Comment 14 osnola 2021-03-23 12:34:57 UTC
A basic filter (from libwps) must be present in LibreOffice 7.1 and allow to open basic .psw and .pwi files...