Bug 152227 - Add support for Zarnegar format
Summary: Add support for Zarnegar format
Status: NEW
Alias: None
Product: Document Liberation Project
Classification: Unclassified
Component: General (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-11-25 23:24 UTC by Hossein
Modified: 2022-11-26 00:29 UTC (History)
0 users

See Also:
Crash report or crash signature:


Attachments
Sample Zarnegar75 file (5.94 KB, application/octet-stream)
2022-11-25 23:24 UTC, Hossein
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hossein 2022-11-25 23:24:13 UTC
Created attachment 183793 [details]
Sample Zarnegar75 file

Description:

Zarnegar was a commercial word processor from SinaSoft co. for DOS and later Windows that supported Persian/Arabic languages. It was the prominent word processor in Iran in 1990s.
Libraries and elsewhere might have many Zarnegar files as a heritage. Being able to import and use these files can be very helpful to preserve them.
More information about this format can be found here:

Zarnegar (word processor)
https://en.wikipedia.org/wiki/Zarnegar_(word_processor)

The original software can be download from here:
http://sinasoft.com/Downloads.html

Two different versions of the program are available:
* Zarnegar 5.2 (Windows): Has problems with Windows 10. The software interface is Persian.
* Zarnegar 76 (DOS): Can be used with DOSBox
https://www.dosbox.com/

Character encoding and examples:

There are 2 common formats for Zarnegar: Zarnegar1 and Zarnegar75. Quoting from the wiki article above:

"Zarnegar1 character set
Zarnegar used an Iran System-based character encoding system, named Zarnegar1, with text file formats for its early versions, up to the Zarnegar 75 version. The Zarnegar1 character set is a two-form left-to-right (visual) encoding, meaning that every Perso-Arabic letter receives different character codes based on its cursive joining form, but most letters receive only two forms, because of the limited code-points available."

A sample from the Python Zarnegar1 convertor:
https://github.com/persian-computing/python-zarnegar-converter/blob/e3482740c34cba14e7c372a675c9166d213629be/samples/zar1-sample-text-01.zar

Also from the same article:

"Zarnegar75 character set
With the Zarnegar 75 version, a new character encoding system was introduced, and the file format was changed to a binary format. The Zarnegar75 character set is a four-form bidirectional visual encoding, meaning that every Perso-Arabic letter receives a one, two, or four character code, depending on its cursive joining form, and these letters are stored in the memory in the semantic order."

A sample Zarnegar75 binary file format is attached.

Convertors:

Some convertors available for Zarnegar file format. For example, this one for Zarnegar1 format is in Python, and provides some examples:
Converter for Zarnegar Encoding and File Format to Unicode Text
https://github.com/persian-computing/python-zarnegar-converter

There is another (closed source) convertor that can convert Zarnegar format to RTF:
https://www.noorsoft.org/file/downloads/productfile/Zarnegar.To_.RTF_.Converter_[www.noorsoft.org].zip
Comment 1 Eike Rathke 2022-11-26 00:29:03 UTC
This could be a candidate for The Document Liberation Project. It's also tracked here, I'm setting its product.
See https://www.documentliberation.org/

Note that the Python code is under GPLv3+ and can't be used.