Bug 51430 - FILEOPEN: HTML documents without META Content-Type are opened with a wrong codepage
Summary: FILEOPEN: HTML documents without META Content-Type are opened with a wrong co...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.4 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords:
: 67610 (view as bug list)
Depends on:
Blocks: HTML-Import
  Show dependency treegraph
 
Reported: 2012-06-25 20:48 UTC by pcarmouze
Modified: 2020-02-14 02:56 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
file which can't be opened correctly (68.38 KB, application/msword)
2012-06-25 22:14 UTC, pcarmouze
Details
Bug 51430 - screenshot (107.06 KB, image/png)
2013-01-28 10:55 UTC, bfoman (inactive)
Details
console logs + bt with symbols (20.91 KB, text/plain)
2014-02-22 12:44 UTC, Julien Nabet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description pcarmouze 2012-06-25 20:48:14 UTC
Problem description: 

Steps to reproduce:
1. export un file from a professional database "navis editions francis lefebvre"
2. the export is made with "doc" termination but it is html format
3. open the file with writer

Current behavior:
when opening the french letter "é" and the following lettre are replaced by "?"
Expected behavior:
correct opening with all letters.
This issue doesn't exist with release libreoffice 3.4 neither open office, neither word 2003
This issue exists also with word 14

Platform (if different from the browser): 
              
Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:13.0) Gecko/20100101 Firefox/13.0.1
Comment 1 pcarmouze 2012-06-25 22:14:04 UTC
Created attachment 63474 [details]
file  which can't be opened correctly
Comment 2 bfoman (inactive) 2013-01-28 10:55:33 UTC
Created attachment 73765 [details]
Bug 51430 - screenshot

(In reply to comment #0)
> Current behavior:
> when opening the french letter "é" and the following lettre are replaced by
> "?"

Confirmed with:
LO 4.0.0.2
Build ID: own W7 debug build
Windows 7 Professional SP1 64 bit

French letter "é" is substituted by "?", some words are garbled, par. 2 is in bold, there are parts of html tags in the text (/STRONG>). Please see attached screenshot.

All good in Word 2010.
Comment 3 Urmas 2013-08-01 10:05:22 UTC
*** Bug 67610 has been marked as a duplicate of this bug. ***
Comment 4 Julien Nabet 2014-02-22 12:44:49 UTC
Created attachment 94566 [details]
console logs + bt with symbols

On pc Debian x86-64 with master sources updated today, I had a crash during opening. I attached bt + console.
Comment 5 Julien Nabet 2014-02-22 13:44:36 UTC
Here a patch for the crash:
diff --git a/sw/source/filter/html/swhtml.cxx b/sw/source/filter/html/swhtml.cxx
index 3cdcbf3..9f716cd 100644
--- a/sw/source/filter/html/swhtml.cxx
+++ b/sw/source/filter/html/swhtml.cxx
@@ -2417,6 +2417,8 @@ void SwHTMLParser::AddParSpace()
             //What I do here, is that I examine the attributes, and if
             //I find out, that it's CJK/CTL, then I set the paragraph space
             //to the value set in HTML_CJK_PARSPACE/HTML_CTL_PARSPACE.
+            if (!pTxtNode->HasHints())
+                return;
 
             sal_Bool bIsCJK = false;
             sal_Bool bIsCTL = false;

But real fix or just hide the real bug?
Comment 6 Alex Thurgood 2015-01-03 17:40:24 UTC Comment hidden (no-value)
Comment 7 Xisco Faulí 2015-09-09 15:31:26 UTC
This issue is still present in

Version: 5.0.1.2
Build ID: 81898c9f5c0d43f3473ba111d7b351050be20261
Locale: es-ES (es_ES)

on Windows 7 (64-bit)

@Julien, Does your patch fix the problem? if so, could you please summit it to Gerrit?
Comment 8 Julien Nabet 2015-09-10 05:58:24 UTC
(In reply to Xisco Faulí from comment #7)
> This issue is still present in
> 
> Version: 5.0.1.2
> Build ID: 81898c9f5c0d43f3473ba111d7b351050be20261
> Locale: es-ES (es_ES)
> 
> on Windows 7 (64-bit)
> 
> @Julien, Does your patch fix the problem? if so, could you please summit it
> to Gerrit?

It dealt just the crash I had. It building right now so perhaps the crash doesn't appear.
Anyway, the fix wasn't intended for the initial bug.
Comment 9 Julien Nabet 2015-09-11 16:33:36 UTC
Comment on attachment 94566 [details]
console logs + bt with symbols

On pc Debian x86-64 with master sources updated today, I don't reproduce the crash.
However, I still reproduce the pb of the initial description.

I noticed several types of logs:
:15: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xE9 0x73 0x20 0x2D
		<title>Portail des abonn�s - Exporter</title>
		                        ^
warn:oox.storage:19117:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: 
VisioDocument: version 0
Found xml parser severity error Input is not proper UTF-8, indicate encoding !
Bytes: 0xE9 0x73 0x20 0x2D

warn:legacy.osl:19117:1:xmloff/source/core/xmlerror.cxx:181: An error or a warning has occurred during XML import/export!
Error-Id: 0x10020002
    Flags: 1 WARNING
    Class: 2 FORMAT
    Number: 2
Parameters:
    0: style:font-name-asian
    1: HG Mincho Light J
Exception-Message: 

warn:unotools.config:19117:1:unotools/source/config/configitem.cxx:445: ignoring XHierarchicalNameAccess to /org.openoffice.Office.WriterWeb/Content/Display/ShowContentTips Exception: Display/ShowContentTips

warn:legacy.tools:19117:1:svtools/source/svrtf/svparser.cxx:308: there is a conversion error

warn:legacy.osl:19117:1:sw/source/core/access/acccontext.cxx:1154: child context should have a size
Comment 10 QA Administrators 2016-09-20 10:32:08 UTC Comment hidden (obsolete)
Comment 11 QA Administrators 2019-12-03 14:03:35 UTC Comment hidden (obsolete)
Comment 12 Kevin Suo 2020-02-14 02:56:00 UTC
Remove the word "Chinese" in summary as I see no Chinese staff anywhere.