Bug 98459 - libreoffice does not recognize the encoding of persian xml files and the persian parts of this document is completely unreadible for me!
Summary: libreoffice does not recognize the encoding of persian xml files and the pers...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Font-Rendering
  Show dependency treegraph
 
Reported: 2016-03-06 04:20 UTC by zahra
Modified: 2023-06-03 19:06 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
the xml file which contains some sayings of prophet mohammad and his progeny peace be upon them which is persian translation. (152.64 KB, text/xml)
2016-03-06 04:20 UTC, zahra
Details
PDF showing how the text is rendered in LibreOffice 7.6 (818.94 KB, application/pdf)
2023-05-29 08:29 UTC, BogdanB
Details

Note You need to log in before you can comment on or make changes to this bug.
Description zahra 2016-03-06 04:20:42 UTC
Created attachment 123331 [details]
the xml file which contains some sayings of prophet mohammad and his progeny peace be upon them which is persian translation.

hi every one. 
i have a valuable document which contains persian translations of some sayings and narrations of prophet mohammad and his progeny peace be upon them. 
the file is in xml format. 
when i try to open it with libreoffice, the persian parts of my documents are completely unreadible and are not displayed in persian characters! 
no matter if i open libreoffice writer, in the file menu i select open, press control+o to activate the open menu. 
or when the file is selected, right click on it and in the open with submenu, select libreoffice writer. 
libreoffice does not recognize the encoding and even does not ask to which encoding should it use to open this file properly! 
as a result, the persian parts of my document, is completely unreadible! 
i tested this in open office. 
when i selected file, right click on it and in the open with submenu choosed open office writer, open office asked me about encoding, font and language and finally could open my file properly! 
steps to reproduce: 
1/ right click in the document which i attached. 
2/ from open with which is submenu, select libreoffice writer and open the document. 
3/ do the same for open office and compare the result. 
current behaviour: libreoffice does not recognize the encoding of persian parts, cant show it properly and even does not ask for choosing encoding. 
expected behaviour: libreoffice recognizes the persian encoding, or at least ask me about what encoding should it use. 
in open office: if i select open in the file menu or press control+o and open this file i get the error: 
general input output error. 
when i right click on my file, (by the way, i use the application key and select open with which is submenu)
and select open office writer for openning my file, open office asks about encoding, font and language, and finally shows the file properly.
i tested many versions of libreoffice and the result was the same. 
this bug is critical for me. 
thanks and i request devine mercy and blessings for you.
Comment 1 Robinson Tryon (qubit) 2016-03-06 06:27:05 UTC
TESTING with Ubuntu 14.04 64bit +
LO 5.2.0.0.alpha0+ (2016-02-24_23:58:47)

(In reply to zahra from comment #0)
> Document: attachment 123331 [details]
> ...
> when i try to open it with libreoffice, the persian parts of my documents
> are completely unreadible and are not displayed in persian characters!
> ...
> steps to reproduce: 
> 1/ right click in the document which i attached. 
> 2/ from open with which is submenu, select libreoffice writer and open the
> document. 

I'll use the alternate steps ("in the file menu i select open, press control+o to activate the open menu.")

> ...
> current behaviour: libreoffice does not recognize the encoding of persian
> parts, cant show it properly and even does not ask for choosing encoding. 
> expected behaviour: libreoffice recognizes the persian encoding, or at least
> ask me about what encoding should it use. 

I'm not sure which sections of the document are Persian/supposed to render as Persian.

zahra: Please attach a couple of screenshots showing 
1) How you expect the document to render (or at least the text)
2) How LibreOffice is rendering the document

Please feel free to crop the screenshots, add some (thin) red circles around important parts, etc. Anything that helps us to focus-in on the specific problem makes triage go much more quickly!

Bonus: If you can demonstrate the problem with a 1- or 2-page document, that really helps. Simple, simple examples are our favorite!

Status -> NEEDINFO
Comment 2 zahra 2016-03-06 07:20:19 UTC
hi. 
as i mentioned libreoffice does not recognize the encoding and even does not ask me which encoding should it use for openning this xml file! 
by the way, forgive me to forgot to mention that the encoding of this document is UTF8
compare it with open office as i mentioned. 
open wich is submenu and select the open office writer in the submenu. 
then select the encoding utf_8 and open it. 
you can observe the result and the difference. 
thanks for your attention.
Comment 3 zahra 2016-03-06 07:22:04 UTC Comment hidden (obsolete)
Comment 4 Pedro 2016-03-06 10:49:46 UTC
(In reply to Robinson Tryon (qubit) from comment #1)

> I'm not sure which sections of the document are Persian/supposed to render
> as Persian.

Everything between two open and close tags e.g. 

<Hadis>?? ???? ?????? ???? ?? ?????? ?? ???? ????? ????? ? ??? ?????? ?? ??????? ? ?? ?? ??? ??????.</Hadis>

You can open the xml file with any browser to check (tested with IE11 and Firefox ESR 38.6.1 under Win 10 x64)

> zahra: Please attach a couple of screenshots showing 
> 1) How you expect the document to render (or at least the text)
> 2) How LibreOffice is rendering the document

Zahra is blind so she can not work with images. The attached file is a good example.

I believe this is Docbook format. Opening with Docbook option selected results in a blank page. I believe this is indeed a bug in the Docbook filter.
Comment 5 QA Administrators 2017-03-06 15:50:42 UTC Comment hidden (obsolete)
Comment 6 QA Administrators 2019-12-03 14:58:46 UTC Comment hidden (obsolete)
Comment 7 Pedro 2019-12-04 13:55:50 UTC
Problem still occurs on LibreOffice 6.2.8 and 6.3.3
Comment 8 QA Administrators 2021-12-04 04:44:28 UTC Comment hidden (obsolete)
Comment 9 Michael Warner 2021-12-04 15:21:16 UTC
If I open it in Writer as text, I see the XML source. By default, it is all set to the font Liberation Mono. In this font, most characters appear to be rendered, but some are not, they are represented as squares (you can see these between the <Source> tags in the first few entries). If I then proceed to change the font to Arial, all characters are rendered. 

I would close as WFM, but this bug was reported against Windows specifically, and I am using Mac OS. 

Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: e9332dcdc8f2ea268d1b17c73d43a8834cf75365
CPU threads: 10; OS: Mac OS X 12.0.1; UI render: Skia/Metal; VCL: osx
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 10 BogdanB 2023-05-29 08:29:58 UTC
Created attachment 187572 [details]
PDF showing how the text is rendered in LibreOffice 7.6

Zahra, could you try with a newer version of LibreOffice?
Working well for me with 7.6.

This is how I see when opening your file with LibreOffice
Version: 7.6.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: b76a3bdc996f275f9d615b32d6ab89d533a7505c
CPU threads: 16; OS: Linux 5.19; UI render: default; VCL: gtk3
Locale: ro-RO (ro_RO.UTF-8); UI: en-US
Calc: threaded