Bug 56169 - writer document saved as .doc, opened in MS Word 2010, Hebrew paste treated as Hindi
Summary: writer document saved as .doc, opened in MS Word 2010, Hebrew paste treated a...
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.4.5 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: RTL-CTL DOC
  Show dependency treegraph
 
Reported: 2012-10-19 08:12 UTC by Lionel Elie Mamane
Modified: 2017-12-04 15:07 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Original document (12.49 KB, application/vnd.oasis.opendocument.text)
2012-10-19 08:12 UTC, Lionel Elie Mamane
Details
MSWord export (13.00 KB, application/msword)
2012-10-19 08:13 UTC, Lionel Elie Mamane
Details
Hindi (431.32 KB, image/png)
2013-01-08 21:56 UTC, Urmas
Details
Hindi (8.85 KB, image/png)
2013-01-08 21:59 UTC, Urmas
Details
"English (UK)" only (9.31 KB, image/png)
2013-01-20 07:53 UTC, Lionel Elie Mamane
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lionel Elie Mamane 2012-10-19 08:12:01 UTC
Created attachment 68787 [details]
Original document

In LibreOffice 3.4.5, Debian package version 1:3.4.5-2 on amd64 arch, I created a fresh new document, and typed some text in English. Saved as .doc (XP/2000/2003), emailed to another person.

That person opened this .doc in Microsoft Word 2010 Hebrew edition, then copied (from same Word) hebrew text from another document into this document. The hebrew text appeared as empty rectangles instead of Hebrew characters. The status line (bottom of the window) indicates MS Word treats it as text in Hindi.
When using "special paste" and chosing "RTF text" or "HTML text" or "plain text" or "unicode plain text", it gets pasted correctly as Hebrew characters.

There is probably a Microsoft Word bug there (pasting from Word to Word should just work), but if we can avoid triggering it, that would be better.
Comment 1 Lionel Elie Mamane 2012-10-19 08:13:31 UTC
Created attachment 68788 [details]
MSWord export
Comment 2 Lionel Elie Mamane 2012-10-19 08:18:23 UTC
Opened a fresh MSWord window, typed some Hebrew.

Copy/paste into the document exported from LibreOffice shows same behaviour.

So problem is not specific to the source document, but indeed specific to the destination document.
Comment 3 Urmas 2013-01-08 01:28:36 UTC
So where LO is coming into this? The attached document is marked as written in Hindi, so...
Comment 4 Lionel Elie Mamane 2013-01-08 06:54:56 UTC
(In reply to comment #3)
> So where LO is coming into this?

LibreOffice is coming into this because the .doc that has the problem is created by LibreOffice. It is a problem for LibreOffice because it breaks interoperability with MS Word and thus people cannot use LibreOffice to exchange documents WRITTEN IN ENGLISH with (Hebrew version) MS Word users.

> The attached document is marked as written in Hindi, so...

The attached ODT file is marked as written in "English (UK)". If the .doc appears as written in Hindi in MS Word, it is a problem in LibreOffice's .doc export.

The codebase has a history of confusing Hebrew and Hindi, see e.g. https://issues.apache.org/ooo/show_bug.cgi?id=86811
Comment 5 Urmas 2013-01-08 08:11:42 UTC
The text in FINAL.odt is marked as "English (GB)/Hindi/Simplified Chinese". So I see no bugs here, and it's not a helpdesk to teach you how to change text languages.
Comment 6 Lionel Elie Mamane 2013-01-08 08:51:39 UTC
(In reply to comment #5)
> The text in FINAL.odt is marked as "English (GB)/Hindi/Simplified Chinese".
> So I see no bugs here, and it's not a helpdesk to teach you how to change
> text languages.

Nope, it is marked as "English (GB)". For *all* text. Not a single paragraph / word / character marked as Hindi or Simplified Chinese.
Comment 7 Urmas 2013-01-08 21:56:56 UTC
Created attachment 72689 [details]
Hindi
Comment 8 Urmas 2013-01-08 21:59:35 UTC
Created attachment 72690 [details]
Hindi
Comment 9 Urmas 2013-01-08 22:01:39 UTC
I've must been dreaming, right? Now educate yourself instead of reopening this.
Comment 10 Lior Kaplan 2013-01-08 22:24:04 UTC
Please stop changing the status of this bug, it causes too much noise due to related bugs. If we can't agree, we need more opinions, not abusing the bug system as part of the disagreement.
Comment 11 Lionel Elie Mamane 2013-01-20 07:53:19 UTC
Created attachment 73320 [details]
"English (UK)" only
Comment 12 Lionel Elie Mamane 2013-01-20 08:48:47 UTC
While I agree that the root cause looks like a Microsoft Word bug (copy/paste from&to Word should just work...), IMHO the fact that a .doc file created by LibreOffice triggers that bug, but a .doc file created by Microsoft Word does, is a problem for LibreOffice on two levels:

1) Terrible user experience for LibreOffice users communicating
   with Microsoft Word users.

2) Competitive disadvantage.


With respect to 1), essentially Urmas is saying that each and every
LibreOffice user that does not read/write neither Hebrew, nor Hindi,
nor any other "complex text layout" language, should learn about the
details of how .odt/.doc formats handle these languages, and think
in advance that there is some hidden setting somewhere that looks
like it is completely irrelevant
(I'm writing a document in pure English, remember?)
and set it in some way that predicts what the person that one is
sending the document to is going to do with it.

E.g. when sending a document written in English to an Israeli and
an Indian, one should make *two* versions of the document, one that
has "CTL language: Hebrew" and one that has "CTL language: Hindi"

This is wrong on sooo many levels...

First, fundamentally, I *do* think that as far as possible, things
should work right out of the box, and not require extensive "education".
One is not wiring the electricity of a house or building a plane,
one is just writing a document.

Second, what do I do when I send it to an Indian immigrant in Israel?
He/she might like to paste some Hebrew inside (to further communicate
with locals: for example add a cover page in Hebrew to send it to a
local administration) or he/she might like to paste some Hindi inside
(to further communicate with family or authorities in India maybe?)



With respect to 2), compare:

With Microsoft Word:

 - create a document with an English or French Word, in an English or French
   OS enviroment, typing English or French in it.

 - save as .doc

 - email that .doc to a user of Microsoft Word Hebrew edition.

 - That user opens it, copy-pastes Hebrew from another document into
   that document, it just works.

With LibreOffice Writer:

 - create a document with an English or French LibreOffice Writer,
   in an English or French OS enviroment, typing English or French in it.

 - save as .doc

 - email that .doc to a user of Microsoft Word Hebrew edition.

 - That user opens it, copy-pastes Hebrew from another document into
   that document, does not work.

In a real world where users need/want to often/regularly/occasionally
co-edit documents with people using Microsoft Word (that for practial
purposes is installed by default on about every new computer one buys...
or installed "free of charge" by a nice friend/neighbour/...),
which of the above behaviours looks the most attractive to you?



IMO, if we can at all improve the situation here, we should.

The example document was created from a linguistically blank LibreOffice
as far a CTL languages are concerned. I just installed LibreOffice,
with support for English, French and possibly a few other ISO-5589-1
European languages (German, Dutch, ...) (so nothing that touches any
CTL issue, and certainly not Hindi). I ran LibreOffice, touched
absolutely no CTL language setting, typed some English, export as .doc
and send.

Why, oh why is anything here set to Hindi in the first place? Could we
just *not* set any CTL language in the document? That might work around
the MS Word bug. Or if that is not possible, maybe (more risky) set it
to a dummy/invalid value?
Comment 13 Urmas 2013-01-20 10:36:05 UTC
Yes, you can. Open the Options/Language Settings/Languages and select [no proof] in 'Default Document Languages'/'Complex Script'.

Also Bugzilla is not a method of getting free technical support.
Comment 14 Urmas 2013-01-20 10:38:40 UTC
The last 'Hebrew edition' of Microsoft Word was Word 97. So maybe you'll stop blame your own incompetence on imaginary bugs in the software?
Comment 15 Lionel Elie Mamane 2013-01-20 10:53:25 UTC
(In reply to comment #14)
> The last 'Hebrew edition' of Microsoft Word was Word 97. So maybe you'll
> stop blame your own incompetence on imaginary bugs in the software?

On my desk is a laptop computer, belonging to my life partner, with a Microsoft Word 2010 with Hebrew UI. That's what I meant by "Hebrew Edition".



(In reply to comment #13)
> Yes, you can. Open the Options/Language Settings/Languages and select [no
> proof] in 'Default Document Languages'/'Complex Script'.

I don't think we should expect our users (which do not necessarily have a clue about CTL languages, not using any themselves) to always do that. Maybe you are saying that fixing bug 39935 would also solve this issue?
Comment 16 Markus Mohrhard 2013-01-20 12:53:54 UTC
(In reply to comment #14)
> The last 'Hebrew edition' of Microsoft Word was Word 97. So maybe you'll
> stop blame your own incompetence on imaginary bugs in the software?

Stop with this destructive behavior or search another project that you can annoy. This is a valid bug report and Lionel has more competence in Libreoffice than you'll ever have.

It might be time for you to accept that you can't hide your incompetence by insulting and bullying users and developers in bugzilla.
Comment 17 Urmas 2013-01-21 04:49:20 UTC
If the desired resolution is automatic language detect applied to pasted text, then it is unacceptable as it will give an advantage to Hebrew as having an unique script comparing to multiple languages using Arabic or Devanagari scrips.

Users of ME versions of Microsoft Word are aware that they have to choose languages and fonts separately for Latin and non-Latin text because it is the case since times of Windows 3.1.

Or is it setting the alternative languages to none/no proof? Then it is a duplicate.
Comment 18 Joel Madero 2013-06-24 17:33:34 UTC
(In reply to comment #16)
> (In reply to comment #14)
> > The last 'Hebrew edition' of Microsoft Word was Word 97. So maybe you'll
> > stop blame your own incompetence on imaginary bugs in the software?
> 
> Stop with this destructive behavior or search another project that you can
> annoy. This is a valid bug report and Lionel has more competence in
> Libreoffice than you'll ever have.
> 


Between the bug being reported by a known, experienced and respected LibreOffice dev and Markus (also meets those requirements) saying it's a valid bug report.

I am marking as NEW

Lionel - if this problem has sorted itself out (I doubt it has ;) please feel free to close as WFM, else, hopefully we can find a taker for this one as I agree, behavior seems like a nuisance on our users
Comment 19 Cédric Bosdonnat 2014-01-20 08:57:25 UTC
Restricted my LibreOffice hacking area
Comment 20 Joel Madero 2015-05-02 15:41:11 UTC Comment hidden (obsolete)
Comment 21 QA Administrators 2016-09-20 09:37:04 UTC Comment hidden (obsolete)
Comment 22 Afief Halumi 2017-12-04 14:46:16 UTC
Wrote text in Word 2013, pasted in LOWriter 5.2.
Works as expected
Comment 23 Lior Kaplan 2017-12-04 15:07:05 UTC
Please add a screen shot of what you see. Also, Could you verify with LibO 5.4 ?