Created attachment 97574 [details] incorrectly being displayed file Hi .DOC - source file in libreoffice can not display correctly. In ms office 2003 - normal visualization.
What do you mean incorrect displaying? What have you got and what did you expect? BTW? It's a plain html file with a doc extension, was it on purpose?
I was expecting to read a text file, as in v.4.2.3(v.4.2.2) If I open a file in MS 2003 I see normal text. Sorry for my english.
Normal visualization .doc in Microsoft Word Viewer http://floomby.ru/s1/8Wr6S7
@Julien: Ivan probably expects that it will open as HTML document, not as a Writer document showing the HTML code. In that case I can confirm this bug with 4.2 branch. Fortunately it's fixed in master by the changes I've made to the HTML detection there. But unfortunately it's a big change and unlikely to be backported to 4.2. If someone wants to work on a fix for 4.2: The problem here is that this file begins with a UTF-8 BOM (http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8), but HTMLParser::IsHTMLFormat (in svtools/source/svhtml/parhtml.cxx) doesn't respect that kind of BOM, only the UTF-16 one. We need to simply skip it, the same way we do for UTF-16.
Comment on attachment 97574 [details] incorrectly being displayed file To avoid confusion, I'll change the file extension and MIME type to HTML. This bug has nothing to do with the .doc extension (but once fixed, it should work even with that extension).
Thank you Maxim for your detailed feedback
In general recent tracker might be a dup of an older one but since the later one has been fixed. I tested with master sources updated today, it was ok. About the code, here's a chain: svtools/source/svhtml/parhtml.cxx uses this include/svtools/parhtml.hxx which uses this /include/svtools/svparser.hxx which is defined here: svtools/source/svrtf/svparser.cxx This last file includes SvParser::GetNextChar() which has been fixed by: http://cgit.freedesktop.org/libreoffice/core/commit/?id=5eb408a3bb8df204452f0b931a254dad5f0cf35b David: would it be ok to cherry-pick http://cgit.freedesktop.org/libreoffice/core/commit/?id=5eb408a3bb8df204452f0b931a254dad5f0cf35b in 4.3 branch (and perhaps in 4.2)? (I can cherry-pick for both and put them to review) *** This bug has been marked as a duplicate of bug 81044 ***
(In reply to comment #7) > David: would it be ok to cherry-pick > http://cgit.freedesktop.org/libreoffice/core/commit/ > ?id=5eb408a3bb8df204452f0b931a254dad5f0cf35b in 4.3 branch (and perhaps in > 4.2)? (I can cherry-pick for both and put them to review) Yes for 4.3, no for 4.2.
David: thank you for your feedback, I put https://gerrit.libreoffice.org/#/c/10742/ (as you must have already seen :-) )
@Julien: This is *not* a duplicate of bug 81044. Bug 81044 is about the filter, this one is about type detection.
Maxim: I'm not sure to understand. When I opened the file, it was ok. Is it KO when you try to open it?
(In reply to comment #11) Julien: This report was regarding 4.2, I know it's fine in 4.3/master.
Ok sorry then, I reopen this tracker.
4.2 is EOL.