Bug 98603 - Hebrew text in pptx is recognized as English and formatting issues ensue
Summary: Hebrew text in pptx is recognized as English and formatting issues ensue
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Impress (show other bugs)
Version:
(earliest affected)
5.1.1.2 rc
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:6.4.0
Keywords:
: 121259 (view as bug list)
Depends on:
Blocks: RTL-CTL PPTX
  Show dependency treegraph
 
Reported: 2016-03-11 14:13 UTC by eladhen2
Modified: 2019-10-21 14:37 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
This zip contains all of the files reffered to in the bug repport (999.22 KB, application/zip)
2016-03-11 14:13 UTC, eladhen2
Details
PPTX left and right aligned (143.33 KB, image/png)
2016-03-12 07:24 UTC, Robinson Tryon (qubit)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description eladhen2 2016-03-11 14:13:52 UTC
Created attachment 123501 [details]
This zip contains all of the files reffered to in the bug repport

Editing a pptx file created in MS Office (01.pptx) using Libreoffice 5.1.1.2 on Linux Mint Cinnamon makes Hebrew text be recognized as English and brakes Hebrew text when the pptx is reopened in MS Office.

In the attached 01.png, showing a screenshot of the original file opened in MS Office 2010 version 14.0.7166.5000 32bit, you can see that:
1. The boxed text is recognized as Hebrew.
2. The text is RTL and aligned to the right. 

In the attached 02.png the pptx file is opened in LO and you can see that:
1. LO recognizes the text as English, so every word is seen as a typo (If I right click a word I get the option "word is Hebrew" which makes LO recognize the word as Hebrew).
2. For some reason, although the text appears to be RTL and aligned to the right, the Paragraph settings indicate it's LTR and aligned to the left.
* Another minor bug here: The fonts used in the original pptx are Times New Roman and Arial, both fonts I have installed on my Linux Mint system, yet LO assigned different fonts to the text...

In the attached 03.png the pptx is reopened in MS Office 2010 version 14.0.7166.5000 32bit after being saved (without further changes) in LO, and you can see that:
1. The language of the text is recognized as English.
2. The text is RTL and aligned to the right.
3. As part of being recognized as English, the punctuation marks "stick to the left" as if they are in English and not to the right as they should in Hebrew.
The pptx file that was saved in LO is attached as 02.pptx

This bug makes editing Hebrew pptx files on LO impossible.
Comment 1 Robinson Tryon (qubit) 2016-03-12 07:24:11 UTC
TESTING with Ubuntu 14.04 +
LO 5.2.0.0.alpha0+ (2016-02-24_23:58:47)

(In reply to eladhen2 from comment #0)
> Editing a pptx file created in MS Office (01.pptx) using Libreoffice 5.1.1.2
> on Linux Mint Cinnamon makes Hebrew text be recognized as English and brakes
> Hebrew text when the pptx is reopened in MS Office.

Sounds like there are a number of different problems listed here, so I'm going to just tackle the first part.

> In the attached 01.png, showing a screenshot of the original file opened in
> MS Office 2010 version 14.0.7166.5000 32bit, you can see that:
> 1. The boxed text is recognized as Hebrew.
> 2. The text is RTL and aligned to the right.

That statement seems accurate.

> In the attached 02.png the pptx file is opened in LO and you can see that:
> 1. LO recognizes the text as English, so every word is seen as a typo (If I
> right click a word I get the option "word is Hebrew" which makes LO
> recognize the word as Hebrew).

CONFIRMED

> 2. For some reason, although the text appears to be RTL and aligned to the
> right, the Paragraph settings indicate it's LTR and aligned to the left.

In this 5.2 daily build, both the left *and* right align buttons appear to be selected (see "PPTX left and right aligned" attachment). That seems to be buggy.

Status -> NEW
Comment 2 Robinson Tryon (qubit) 2016-03-12 07:24:46 UTC
Created attachment 123517 [details]
PPTX left and right aligned
Comment 3 QA Administrators 2017-05-22 13:18:51 UTC Comment hidden (obsolete)
Comment 4 eladhen2 2018-03-12 14:46:50 UTC
This bug is still present in 6.0.2.1.
Comment 5 Eyal Rozenberg 2018-12-27 16:28:07 UTC
Elad, can you please, using Powerpoint, replace the Hebrew text with Arabic text (e.g. use Google Translate), without changing any of the formatting, and see whether you get the same behavior?
Comment 6 eladhen2 2018-12-31 08:44:12 UTC
(In reply to Eyal Rozenberg from comment #5)
> Elad, can you please, using Powerpoint, replace the Hebrew text with Arabic
> text (e.g. use Google Translate), without changing any of the formatting,
> and see whether you get the same behavior?

I don't have an available installation of MS office at the moment.
Comment 7 Commit Notification 2019-10-10 03:31:07 UTC
Mark Hung committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/f9569785dd513b9b2f1d7c8c687fed285b0ad280

tdf#98603 fix the missing char property (1/2).

It will be available in 6.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Commit Notification 2019-10-10 11:26:52 UTC
Mark Hung committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/175ab303958809391bfd985729f177d26ba35cbb

tdf#98603 export runs with correct lang attribute (2/2).

It will be available in 6.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Mark Hung 2019-10-19 05:05:30 UTC
*** Bug 121259 has been marked as a duplicate of this bug. ***
Comment 10 Eyal Rozenberg 2019-10-21 14:32:07 UTC
(In reply to Mark Hung from comment #8)

First of all - kudos for the fix :-)

Second - based on your fix, it seems this bug is not Hebrew-specific, so re-hanging it on RTL-CTL.

Finally, do you think this is likely to effect other outstanding RTL / MS Office-import-related bugs?