Created attachment 119818 [details] The HTML file that is used as the "external data" 1. Open a new Calc spreadsheet 2. Go into Insert -> Link to External Data..." 3. Browse to the text.html file (see attached) 4. Pick Automatic in the Import Options dialog box, then click on OK Persian text is not imported. The original URL for the data is http://www.tsetmc.com/Loader.aspx?ParTree=15 - issue originally reported by Farid Tofighi on Google+ LibreOffice Community channel.
Created attachment 119819 [details] Resulting calc ODS document with corrupted Persian characters.
Works ok here. Plz attach screenshot of corruption. Set to NEEDINFO. Change back to UNCONFIRMED after you have provided the screenshot. Win 7 Pro 64-bit, Version: 5.0.2.2 (x64) Build ID: 37b43f919e4de5eeaca9b9755ed688758a8251fe Locale: fi-FI (fi_FI)
Trying again - I can't seem to get LibreOffice to pull the data when I link to http://bug-attachments.documentfoundation.org/attachment.cgi?id=119818
Even more strange - the latest build of LibreOffice (from master!) is now missing this menu item.
Still occurring. My steps were a bit unclear. 1. Open a new Calc spreadsheet 2. Go into Insert -> Link to External Data..." 3. Point to https://bugs.documentfoundation.org/attachment.cgi?id=119818 4. Pick Automatic in the Import Options dialog box, then choose the all table, then click on OK LibreOffice has problems importing. If you actually take the html file directly and import as a file, it doesn't have a problem. There is a lot of SAL_WARNs though: (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' :1: parser error : StartTag: invalid element name <!doctype html> ^ warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: :1: parser error : StartTag: invalid element name <!doctype html> ^ :1: parser error : StartTag: invalid element name <!doctype html> ^ warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:oox.storage:3764:1:oox/source/helper/zipstorage.cxx:66: ZipStorage::ZipStorage exception opening input storage: warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:vcl:3764:1:vcl/source/window/winproc.cxx:862: ImplHandleKey: Keyboard-Input is sent to a frame without focus warn:sfx.doc:3764:1:sfx2/source/doc/docfile.cxx:693: Physical name not convertible! warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:legacy.tools:3764:1:editeng/source/editeng/eehtml.cxx:54: EditHTMLParser::EditHTMLParser: Where does the encoding come from? warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:vcl:3764:1:vcl/source/window/winproc.cxx:862: ImplHandleKey: Keyboard-Input is sent to a frame without focus warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:sfx.doc:3764:1:sfx2/source/doc/docfile.cxx:693: Physical name not convertible! warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:legacy.tools:3764:1:editeng/source/editeng/eehtml.cxx:54: EditHTMLParser::EditHTMLParser: Where does the encoding come from? warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:ucb.ucp.webdav:3764:1:ucb/source/ucp/webdav-neon/NeonSession.cxx:1703: Neon received http error: '200 OK' warn:legacy.tools:3764:1:svl/source/items/poolitem.cxx:114: destroying item in use In other words, the editeng HTML control doesn't know: 1. Valid syntax for HTML 5 (it is complaining about <!doctype html> which is valid) 2. Seems to be having issues with detecting the encoding.
Created attachment 121553 [details] Corrupted on Linux
It looks like the way it detects the encoding is to look to see whether the file starts with a BOM. Unfortuantely, that's not how web pages are sent. Instead, we should be looking at the headers that are returned from the web server: HTTP/1.1 200 OK Server: nginx/1.2.1 Date: Sat, 26 Dec 2015 01:41:30 GMT Content-Type: text/html; name="text.html"; charset=UTF-8 Content-Length: 982 Connection: keep-alive X-xss-protection: 1; mode=block Content-disposition: inline; filename="text.html" X-content-type-options: nosniff
So this goes through WebDAV, and at this point I've got no idea how it works. But stepping through the code, it's very suspicious that WebDAV sees 200 responses as errors.
(In reply to Chris Sherlock from comment #5) > Still occurring. My steps were a bit unclear. > > 1. Open a new Calc spreadsheet > 2. Go into Insert -> Link to External Data..." > 3. Point to https://bugs.documentfoundation.org/attachment.cgi?id=119818 > 4. Pick Automatic in the Import Options dialog box, then choose the all > table, then click on OK Ok now I could repro and got garbled characters. For step 3, we have to press enter after pasting the url (a bit unclear UX there). Win 7 Pro 64-bit Version: 5.2.0.0.alpha0+ Build ID: a4764cfa80270f973da5861d0ddc28298bf16f4d CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2015-12-24_22:45:12 Locale: fi-FI (fi_FI)
(In reply to Chris Sherlock from comment #8) > So this goes through WebDAV, and at this point I've got no idea how it > works. But stepping through the code, it's very suspicious that WebDAV sees > 200 responses as errors. Means neon library returns an error but with an http code of '200 OK' apparently on this Web server that means 'PROPFIND method is not available'. Probably I should have chosen a different wording for the message. Enabling +INFO.ucb.ucp.webdav you'll see almost the whole protocol exchange. The content-type property should be mapped to ucb property MediaType. In a WebDAV server (or a Web server with r/o WebDAV enabled) MediaType is mapped to 'getcontentype' DAV property, giving you the correct value. I need to see what happens in a web link processing.
@Chris: I found some time to analyze this bug. It seems that when the webdav provider client application ask for MediaType property and the target URL is on a simple Web site, in this member function: <http://opengrok.libreoffice.org/xref/core/ucb/source/ucp/webdav-neon/webdavcontent.cxx#1228> the fallback to obtain the value from content-type header value in HEAD method response doesn't work as it should. As Mark Hung pointed out on dev-list, they may be as well header name character case problem, possibly in the call back functions that analyze the response. I'll have a look into it.
@Chris: I pushed to gerrit: <https://gerrit.libreoffice.org/#/c/21907/1> The fixes the bug, both are needed. Two patch to help bisecting. If you find the time, please test them.
Giuseppe Castagno committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=e973b342826e54f147251b132c3325d30749e312 Related tdf#95217: Http header names are case insensitive It will be available in 5.2.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Giuseppe Castagno committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=d61352f58a7f750d3b0b0a9c2d6498fbb7a6e10d Related tdf#95217: Force HEAD method in Web access if PROPFIND failed It will be available in 5.2.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Giuseppe - that's fantastic work! Sorry I took do long to respond - nice bit of troubleshooting, and nice to see such an elegant fix :-) I tip my hat to you. I'll build LO again and test this, then sign off on it.
Excellent - I can confirm this is now working as intended - the Persian text is now rendering correctly. Many thanks Giuseppe!
Giuseppe Castagno committed a patch related to this issue. It has been pushed to "libreoffice-5-1": http://cgit.freedesktop.org/libreoffice/core/commit/?id=3d03b2f51912e7ca49251befca3fa61021dc6154&h=libreoffice-5-1 Related tdf#95217: Http header names are case insensitive It will be available in 5.1.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Giuseppe Castagno committed a patch related to this issue. It has been pushed to "libreoffice-5-1": http://cgit.freedesktop.org/libreoffice/core/commit/?id=abec158e8b0a5c07380cd2bc7f7c5edbef878bed&h=libreoffice-5-1 Related tdf#95217: Force HEAD method in Web access if PROPFIND failed It will be available in 5.1.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Giuseppe Castagno committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/a50dbf49906f4aab367b2556be99779b2b05866d ucb: webdav-curl: Related tdf#95217: Http header names are case insensitive It will be available in 7.3.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Giuseppe Castagno committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/09730ea0ccfe63982cdb869d5eaa906982283bf1 ucb: webdav-curl: Related tdf#95217: Force HEAD method in Web access if PROPFIND failed It will be available in 7.3.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
I do confirm this issue is fixed in Version: 7.3.0.0.beta1+ / LibreOffice Community Build ID: 86f539a23b08d0cc9e5e9566ac31380e373be13f CPU threads: 4; OS: Linux 5.10; UI render: default; VCL: gtk3 Locale: en-US (en_US.UTF-8); UI: en-US Calc: threaded @Michael Stahl, thanks for fixing it!!
Xisco Fauli committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/d7fb6b22cf7e66eb2594001cc42c6bff8b5a49e2 tdf#95217, tdf#142600: sc: Add UItest It will be available in 7.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Actually the encoding problem was fixed by https://git.libreoffice.org/core/commit/3392f567be8d52804b187b0bced47204ef38fa3c for bug 146048
Xisco Fauli committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/098da417618e09692a8f574e2c5cb7af582104e9 tdf#95217: sc: simplify test It will be available in 7.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.