Problem description: When saving hyperlinks of YouTube and BaiDu search queries containing Chinese characters, LibreOffice modifies the hyperlinks, and the hyperlinks are unusable. Steps to reproduce: 1. Create a new Writer document 2. Insert a hyperlink containing: "http://www.youtube.com/results?search_query=%E7%B2%B5%E8%AA%9Emv&sm=12" 3. Save the file in .html format 4. Open the file in WordPad 5. Observe the hyperlink has been modified to: "http://www.youtube.com/results?search_query=粵語mv&sm=12" Current behavior: Modifies the hyperlink upon saving the file. Expected behavior: Should keep the original hyperlink upon saving the file. Operating System: Windows 7 Version: 4.2.1.1 release
This is an Easy Hack. see SwHTMLWriter::OutHyperlinkHRefValue() in sw/source/filter/html/wrthtml.cxx see HTML URL Encoding reference at: http://www.w3schools.com/tags/ref_urlencode.asp see also http://tools.ietf.org/html/rfc3986
I'm going to try to take on this one for my first hack. I'll drop it in the next few days if it turns out too much to handle.
adding LibreOffice developer list as CC to unresolved EasyHacks for better visibility. see e.g. http://nabble.documentfoundation.org/minutes-of-ESC-call-td4076214.html for details
(In reply to comment #0) > 2. Insert a hyperlink containing: > "http://www.youtube.com/results?search_query=%E7%B2%B5%E8%AA%9Emv&sm=12" Just for info: On Fedora 20, libreoffice 4.3.0.2, When I insert a link with the following and save as HTML: "http://www.youtube.com/results?search_query=%E7%B2%B5%E8%AA%9Emv&sm=12" The link changed to http://www.youtube.com/results?search_query=粵語mv&sm=12 (i.e, Only the char "&" is changed to "&". it's becoming better?)
adding LibreOffice developer list as CC to unresolved Writer EasyHacks for better visibility. see e.g. http://nabble.documentfoundation.org/minutes-of-ESC-call-td4076214.html for details
(In reply to Eric Wang from comment #2) > I'm going to try to take on this one for my first hack. I'll drop it in the > next few days if it turns out too much to handle. No activity on this task for a long time. I'll try to resolve it.
(In reply to Kevin Suo from comment #4) > The link changed to > http://www.youtube.com/results?search_query=粵語mv&sm=12 LO internally converts a URI entered into the Hyperlink dialog into an IRI (cf. <http://cgit.freedesktop.org/libreoffice/core/commit/?id=abb5e84c74b781f3615862695db4e5eaadc12cfe> "Do not corrupt URIs entered into the Hyperlink dialog"). It is written out as an IRI at least into ODT documents (and XLink allows for IRIs, so that appears to be OK). But at least the HTML 4.01 spec appears to only allow for URIs, not IRIs, in <a href="...". So it's probably good to explicitly convert from IRI to URI upon exporting at least to HTML (as proposed in <https://gerrit.libreoffice.org/#/c/14223>), short of revisiting the decision to store as IRI rather than URI internally in LO.
Vasily Melenchuk committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=0706b5756e06b7773a78e3046a47efc2c81d92b1 tdf#76291 write encoded URL as href in html output It will be available in 4.5.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Vasily Melenchuk committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=675e1fe198298702ced8eab02a7df5164d66a8f0 tdf#76291 unit test for html export href encoding It will be available in 4.5.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Vasily Melenchuk committed a patch related to this issue. It has been pushed to "libreoffice-4-4": http://cgit.freedesktop.org/libreoffice/core/commit/?id=4327b7882c38005d89b07e76814705d1c53f3161&h=libreoffice-4-4 tdf#76291 write encoded URL as href in html output It will be available in 4.4.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Vasily Melenchuk committed a patch related to this issue. It has been pushed to "libreoffice-4-4": http://cgit.freedesktop.org/libreoffice/core/commit/?id=fafd6cd4f784e5b65548af699bc25502f10a4b8d&h=libreoffice-4-4 tdf#76291 unit test for html export href encoding It will be available in 4.4.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
(In reply to Commit Notification from comment #10) > Vasily Melenchuk committed a patch related to this issue. > It has been pushed to "libreoffice-4-4": > > http://cgit.freedesktop.org/libreoffice/core/commit/ > ?id=4327b7882c38005d89b07e76814705d1c53f3161&h=libreoffice-4-4 > > tdf#76291 write encoded URL as href in html output This commit seems to make failing the build of LO 4.4.1.0.0+ : [build CUT] sw_htmlimport File tested,Execution Time (ms) HTML parser error : Opening and ending tag mismatch: a and font ce="SimSun"><font size="2" style="font-size: 10pt"><span lang="zh-CN">粵語</a> ^ HTML parser error : Opening and ending tag mismatch: a and font ce="SimSun"><font size="2" style="font-size: 10pt"><span lang="zh-CN">粵語</a> ^ HTML parser error : Unexpected end tag : span Sun"><font size="2" style="font-size: 10pt"><span lang="zh-CN">粵語</a></span> ^ HTML parser error : Unexpected end tag : font ont size="2" style="font-size: 10pt"><span lang="zh-CN">粵語</a></span></font> ^ HTML parser error : Unexpected end tag : font e="2" style="font-size: 10pt"><span lang="zh-CN">粵語</a></span></font></font> ^ tdf76291.odt,/home/[..]/lo44/test/source/xmltesttools.cxx:53:testExportUrlEncoding::Import_Export equality assertion failed - Expected: 1 - Actual : 2 - In <file:///home/[..]/tmp/luup6jlk.tmp>, XPath '/html/body/p/a' number of nodes is incorrect File tested,Execution Time (ms) checkbox-radiobutton.doc,346 File tested,Execution Time (ms) HTMLImage.odt,199 File tested,Execution Time (ms) skipimage-embedded-document.docx,315 File tested,Execution Time (ms) skipimage-embedded.doc,382 File tested,Execution Time (ms) textAndImage.docx,171 File tested,Execution Time (ms) textAndImage.docx,86 File tested,Execution Time (ms) charborder.odt,94 File tested,Execution Time (ms) charborder.odt,232 File tested,Execution Time (ms) fdo86857.html,43 File tested,Execution Time (ms) fdo86857.html,134 File tested,Execution Time (ms) fdo62336.docx,336 File tested,Execution Time (ms) fdo81276.html,17 File tested,Execution Time (ms) fdo81276.html,93 xmltesttools.cxx:53:Assertion Test name: testExportUrlEncoding::Import_Export equality assertion failed - Expected: 1 - Actual : 2 - In <file:///home/[..]/tmp/luup6jlk.tmp>, XPath '/html/body/p/a' number of nodes is incorrect Failures !!! Run: 15 Failure total: 1 Failures: 1 Errors: 0 Error: a unit test failed, please do one of: export DEBUGCPPUNIT=TRUE # for exception catching export CPPUNITTRACE="gdb --args" # for interactive debugging on Linux export CPPUNITTRACE="\"[full path to devenv.exe]\" /debugexe" # for interactive debugging in Visual Studio export VALGRIND=memcheck # for memory checking and retry using: make CppunitTest_sw_htmlexport /home/[..]/lo44/solenv/gbuild/CppunitTest.mk:81: recipe for target '/home/[..]/lo44/workdir/CppunitTest/sw_htmlexport.test' failed make[1]: *** [/home/[..]/lo44/workdir/CppunitTest/sw_htmlexport.test] Error 1 make[1]: *** Attente des tâches non terminées.... Makefile:237: recipe for target 'build' failed make: *** [build] Error 2 It was an incremental build. Is it useful to retry with a complete rebuild ? Best regards. JBF
Andras Timar committed a patch related to this issue. It has been pushed to "libreoffice-4-4": http://cgit.freedesktop.org/libreoffice/core/commit/?id=72812676b419b8d5fa32d06e5c45af73d08eac59&h=libreoffice-4-4 tdf#76291 adapt unit test to libreoffice-4-4 It will be available in 4.4.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
(In reply to Commit Notification from comment #13) > Andras Timar committed a patch related to this issue. > It has been pushed to "libreoffice-4-4": > > http://cgit.freedesktop.org/libreoffice/core/commit/ > ?id=72812676b419b8d5fa32d06e5c45af73d08eac59&h=libreoffice-4-4 > > tdf#76291 adapt unit test to libreoffice-4-4 Thank you, that solved the build problem. Best regards. JBF
Migrating Whiteboard tags to Keywords: ( EasyHack SkillCpp ) [NinjaEdit]
Remove LibreOffice Dev List from CC on EasyHacks (curtailing excessive email to list) [NinjaEdit]