Created attachment 121310 [details] test file Attached is an .xls file in html format. The file contains two columns:Bank Account Number and ID Card Number. We expect these two fields to be "text" type. However, when open with LibreOffice Calc, the cells are showing float point numeric values. In Microsoft Office and WPS Office the cells are showing as "text" types as expected. Steps to reproduce: 1. Open the attached xls file in Calc; 2. Observe the cell values. --> They are showing as numeric. We expect the cells to be "text" values. Version: 5.0.4.2 Build ID: 2b9802c1994aa0b7dc6079e128979269cf95bc78 Locale: zh-CN (zh_CN) Win10 x64 PS. This issue was initially reported by libreoffice_xf in the LibreOffice Chinese Forum: http://www.libreofficechina.org/thread-1390-1-1.html
Confirmed. Win 7 Pro 64-bit Version: 5.2.0.0.alpha0+ Build ID: 014633f83e44ae8ba33087b6f38e8e253e281969 CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2015-12-15_06:21:44 Locale: fi-FI (fi_FI)
** Please read this message in its entirety before responding ** To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present on a currently supported version of LibreOffice (5.2.5 or 5.3.0 https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the version of LibreOffice and your operating system, and any changes you see in the bug behavior If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a short comment that includes your version of LibreOffice and Operating System Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to "inherited from OOo"; 4b. If the bug was not present in 3.3 - add "regression" to keyword Feel free to come ask questions or to say hello in our QA chat: http://webchat.freenode.net/?channels=libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug-20170306
Bug still exists in the latest master. --------------- MORE INFO: With a debug run, I get the follow warning when open the attached test document in Calc: warn:legacy.tools:19891:1:editeng/source/editeng/eehtml.cxx:54: EditHTMLParser::EditHTMLParser: Where does the encoding come from? warn:svtools:19891:1:svtools/source/svhtml/parhtml.cxx:1427: GetOption: unknown HTML option warn:svtools:19891:1:svtools/source/svhtml/parhtml.cxx:1427: GetOption: unknown HTML option <same line repeated 20 times>
** Please read this message in its entirety before responding ** To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
(In reply to QA Administrators from comment #4) Still reproducible in Version: 6.3.0.0.alpha1+ Build ID:d2fa9c0d657877c967e41fdd0091f81d1b7ca048 CPU 线程:4; 操作系统:Linux 4.18; UI 渲染:默认; VCL: gtk3; Locale: zh-CN (zh_CN.UTF-8); UI-Language: zh-CN Calc: threaded Ubuntu 18.04 LTS X64
Dear Kevin Suo, To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
(In reply to QA Administrators from comment #6) The bug is still present in: Version: 7.1.4.0.0+ / LibreOffice Community Build ID: b2fc048cb2d5f5bd1095a8110fa4a16a305a8acc CPU threads: 4; OS: Linux 5.11; UI render: default; VCL: gtk3 Locale: zh-CN (zh_CN.UTF-8); UI: zh-CN Calc: threaded
The (html) file contains the attribute "x:str" in its <td> tag: <tr height="19" style='height:14.25pt;'> <td class="xl67" height="19" style='height:14.25pt;' x:str>4100025601074122197</td> <td class="xl68" x:str>350627197809253585</td> </tr> which indicates the cell should be of string value rather than a number.
I have submitted a patch on gerrit for review: https://gerrit.libreoffice.org/c/core/+/123620 Please help to test with this patch.
I abandoned the patch because it was a misunderstanding. I did not found any specifications indicating the x:str meaning the cell should be in text format. Rather, the test document has style class ".xl67" or ".x168" applied to each of the cells appearing as digits. Each of these two styles has the attribute 'mso-number-format:"\@"'. Thus LibreOffice should get this attribute to set the cell format as TEXT. However, it failed to do so because, it seems the stylesheet was never parsed! In "ScHTMLTable::DataOn", "case HtmlOptionId::CLASS:", "rStyles = mpParser->GetStyles()" returned an empty style all the time.
The failing to detect the css stylesheet can be observed if you set a debug line like this: --- a/sc/source/filter/html/htmlpars.cxx +++ b/sc/source/filter/html/htmlpars.cxx @@ -3105,6 +3105,7 @@ void ScHTMLQueryParser::ParseStyle(std::u16string_view rStrm) } catch (const orcus::css::parse_error&) { + SAL_WARN("sc.htmlimport", "FIXME: Failed to parse CSS stylesheet!"); // TODO: Parsing of CSS failed. Do nothing for now. } }
I am adding Kohei Yoshida to cc: could you please take a look? This seems to be related to orcus. There are several issues: 1. If the stylesheet value contains Chinese characters which is not quoted, then the orcus::css_parser.parse() will raise an error. For instance, the html document in attachment 121310 [details] contains the following: .style18 {mso-pattern:auto none; background:#FFCC99; color:#3F3F76; font-size:11.0pt; font-weight:400; font-style:normal; text-decoration:none; font-family:宋体; mso-generic-font-family:auto; mso-font-charset:0; border:.5pt solid #7F7F7F; mso-style-name:"输入";} in which the value for entry "font-family:宋体;" contains non-ascii chars and is not quoted, thus it raises an error in function css_parser<_Handler>::value() in css_parser.hpp: css::parse_error::throw_with("value:: illegal first character of a value '", c, "'"); 2. If I have all the above un-quoted Chinese char values quoted (see attachment attached in the next reply), there seems to be no errors raised during parsing, but the returned CSSHandler is empty in ScHTMLQueryParser::ParseStyle:
Created attachment 175756 [details] test file 2 with Chinese-char values quoted
I submitted another patch: https://gerrit.libreoffice.org/c/core/+/123715 With this patch, the "test file 2" (attachment 175756 [details]) is not opened correctly with the MSO-Number-Formats applied, the ID numbers are shown correctly as TEXT. However, attachment 121310 [details] (i.e. the one which has Chinese char property-values unquoted. Anyone who has any idea how to add quotes to those chars? The css OSring looks like this: .style0 {mso-number-format:"General"; text-align:general; ... font-family:宋体; mso-font-charset:134; ... mso-style-id:0;} .style16 {mso-number-format:"_ \0022\00A5\0022* \#\,\#\#0_ \;_ \0022\00A5\0022* \\-\#\,\#\#0_ \;_ \0022\00A5\0022* \0022-\0022_ \;_ \@_ "; mso-style-name:"货币[0]"; mso-style-id:7;} The unquoted "font-family:宋体;" should be changed to "font-family:'宋体';" before passed to orcus::css_parser in ScHTMLQueryParser::ParseStyle
A followup patch address the un_quoted property value is submitted as: https://gerrit.libreoffice.org/c/core/+/123727/4
Kevin Suo committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/a0c23b40905d7b59caf46fc8887864ab35142522 tdf#96499 sc htmlimport: fix broken CSSHandler so that... It will be available in 7.3.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
As I stated in comment 12, there are two issues here. 1) RESOLVED: The first issue is that normal css styles are not parsed at all (due to regressions in our code base during orcus version updates). This is addressed in https://gerrit.libreoffice.org/c/core/+/123715/5 and is now merged as commit a0c23b40905d7b59caf46fc8887864ab35142522. 2) TODO: The 2nd issue is that, even with the fix in 1 above, if the css style contains un-quoted non [a-zA-Z] chars (e.g. CJK or other chars) as property names, orcus has failed to parse. I had submitted a patch in https://gerrit.libreoffice.org/c/core/+/123727/5 but I abandoned it myself because I am not sure whether it is the right approach. Anyone who is interested in this please do continue to fix this. I mark this back to NEW then.
Kevin Suo committed a patch related to this issue. It has been pushed to "libreoffice-7-2": https://git.libreoffice.org/core/commit/a25f615a103f6ed3c2d4c35d2eacdd828b75854e tdf#96499 sc htmlimport: fix broken CSSHandler so that... It will be available in 7.2.3. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
The todo issue as indicated in comment 17 is submitted to orcus issue tracker: https://gitlab.com/orcus/orcus/-/issues/140
(In reply to Kevin Suo from comment #19) The todo issue is now fixed on Orcus master branch. This bug may be resolved when we upgrade the orcus used by lo to a (pending) new version.
https://gerrit.libreoffice.org/c/core/+/124573
The TODO issue is now resolved in LibreOffice master via Kohei's upgrade of lo orcus version to 0.17.0 in commit eb07a0e76. Mark as RESOLVED FIXED.
Xisco Fauli committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/9d02b1edafd44b75a8996a97c329fdd4967e8f54 tdf#96499: sc: Add UItest It will be available in 7.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.