Bug 76291 - FILESAVE: Chinese hyperlinks modified upon Saving
Summary: FILESAVE: Chinese hyperlinks modified upon Saving
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.2.1.1 release
Hardware: Other Windows (All)
: medium normal
Assignee: Vasily Melenchuk (CIB)
URL:
Whiteboard: BSA target:4.5.0 target:4.4.1
Keywords: easyHack, skillCpp
Depends on:
Blocks: CJK
  Show dependency treegraph
 
Reported: 2014-03-17 20:12 UTC by Marcus Lui
Modified: 2016-02-18 16:37 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcus Lui 2014-03-17 20:12:09 UTC
Problem description: 

When saving hyperlinks of YouTube and BaiDu search queries containing Chinese characters, LibreOffice modifies the hyperlinks, and the hyperlinks are unusable.

Steps to reproduce:

1. Create a new Writer document
2. Insert a hyperlink containing: "http://www.youtube.com/results?search_query=%E7%B2%B5%E8%AA%9Emv&sm=12"
3. Save the file in .html format
4. Open the file in WordPad
5. Observe the hyperlink has been modified to: "http://www.youtube.com/results?search_query=粵語mv&sm=12"

Current behavior:

Modifies the hyperlink upon saving the file.

Expected behavior:

Should keep the original hyperlink upon saving the file.
              
Operating System: Windows 7
Version: 4.2.1.1 release
Comment 1 Andras Timar 2014-03-21 08:07:18 UTC
This is an Easy Hack.

see SwHTMLWriter::OutHyperlinkHRefValue() in sw/source/filter/html/wrthtml.cxx
see HTML URL Encoding reference at: http://www.w3schools.com/tags/ref_urlencode.asp
see also http://tools.ietf.org/html/rfc3986
Comment 2 Eric Wang 2014-04-04 00:40:22 UTC
I'm going to try to take on this one for my first hack. I'll drop it in the next few days if it turns out too much to handle.
Comment 3 Björn Michaelsen 2014-05-23 12:13:21 UTC
adding LibreOffice developer list as CC to unresolved EasyHacks for better visibility.

see e.g. http://nabble.documentfoundation.org/minutes-of-ESC-call-td4076214.html for details
Comment 4 Kevin Suo 2014-07-06 11:00:12 UTC
(In reply to comment #0)

> 2. Insert a hyperlink containing:
> "http://www.youtube.com/results?search_query=%E7%B2%B5%E8%AA%9Emv&sm=12"

Just for info:
On Fedora 20, libreoffice 4.3.0.2, When I insert a link with the following and save as HTML:
"http://www.youtube.com/results?search_query=%E7%B2%B5%E8%AA%9Emv&sm=12"

The link changed to
http://www.youtube.com/results?search_query=粵語mv&sm=12

(i.e, Only the char "&" is changed to "&". it's becoming better?)
Comment 5 Björn Michaelsen 2014-12-02 10:53:12 UTC
adding LibreOffice developer list as CC to unresolved Writer EasyHacks for better visibility.

see e.g. http://nabble.documentfoundation.org/minutes-of-ESC-call-td4076214.html for details
Comment 6 Vasily Melenchuk (CIB) 2015-01-27 16:23:36 UTC
(In reply to Eric Wang from comment #2)
> I'm going to try to take on this one for my first hack. I'll drop it in the
> next few days if it turns out too much to handle.

No activity on this task for a long time. I'll try to resolve it.
Comment 7 Stephan Bergmann 2015-01-29 10:34:47 UTC
(In reply to Kevin Suo from comment #4)
> The link changed to
> http://www.youtube.com/results?search_query=粵語mv&sm=12

LO internally converts a URI entered into the Hyperlink dialog into an IRI (cf. <http://cgit.freedesktop.org/libreoffice/core/commit/?id=abb5e84c74b781f3615862695db4e5eaadc12cfe> "Do not corrupt URIs entered into the Hyperlink dialog").  It is written out as an IRI at least into ODT documents (and XLink allows for IRIs, so that appears to be OK).  But at least the HTML 4.01 spec appears to only allow for URIs, not IRIs, in <a href="...".  So it's probably good to explicitly convert from IRI to URI upon exporting at least to HTML (as proposed in <https://gerrit.libreoffice.org/#/c/14223>), short of revisiting the decision to store as IRI rather than URI internally in LO.
Comment 8 Commit Notification 2015-02-02 09:58:07 UTC
Vasily Melenchuk committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=0706b5756e06b7773a78e3046a47efc2c81d92b1

tdf#76291 write encoded URL as href in html output

It will be available in 4.5.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 9 Commit Notification 2015-02-06 21:12:20 UTC
Vasily Melenchuk committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=675e1fe198298702ced8eab02a7df5164d66a8f0

tdf#76291 unit test for html export href encoding

It will be available in 4.5.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 10 Commit Notification 2015-02-06 23:46:51 UTC
Vasily Melenchuk committed a patch related to this issue.
It has been pushed to "libreoffice-4-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=4327b7882c38005d89b07e76814705d1c53f3161&h=libreoffice-4-4

tdf#76291 write encoded URL as href in html output

It will be available in 4.4.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 11 Commit Notification 2015-02-06 23:46:59 UTC
Vasily Melenchuk committed a patch related to this issue.
It has been pushed to "libreoffice-4-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=fafd6cd4f784e5b65548af699bc25502f10a4b8d&h=libreoffice-4-4

tdf#76291 unit test for html export href encoding

It will be available in 4.4.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 12 Jean-Baptiste Faure 2015-02-07 08:00:16 UTC
(In reply to Commit Notification from comment #10)
> Vasily Melenchuk committed a patch related to this issue.
> It has been pushed to "libreoffice-4-4":
> 
> http://cgit.freedesktop.org/libreoffice/core/commit/
> ?id=4327b7882c38005d89b07e76814705d1c53f3161&h=libreoffice-4-4
> 
> tdf#76291 write encoded URL as href in html output

This commit seems to make failing the build of LO 4.4.1.0.0+ :

[build CUT] sw_htmlimport
File tested,Execution Time (ms)
HTML parser error : Opening and ending tag mismatch: a and font
ce="SimSun"><font size="2" style="font-size: 10pt"><span lang="zh-CN">粵語</a>
                                                                               ^
HTML parser error : Opening and ending tag mismatch: a and font
ce="SimSun"><font size="2" style="font-size: 10pt"><span lang="zh-CN">粵語</a>
                                                                               ^
HTML parser error : Unexpected end tag : span
Sun"><font size="2" style="font-size: 10pt"><span lang="zh-CN">粵語</a></span>
                                                                               ^
HTML parser error : Unexpected end tag : font
ont size="2" style="font-size: 10pt"><span lang="zh-CN">粵語</a></span></font>
                                                                               ^
HTML parser error : Unexpected end tag : font
e="2" style="font-size: 10pt"><span lang="zh-CN">粵語</a></span></font></font>
                                                                               ^
tdf76291.odt,/home/[..]/lo44/test/source/xmltesttools.cxx:53:testExportUrlEncoding::Import_Export
equality assertion failed
- Expected: 1
- Actual  : 2
- In <file:///home/[..]/tmp/luup6jlk.tmp>, XPath '/html/body/p/a' number of nodes is incorrect

File tested,Execution Time (ms)
checkbox-radiobutton.doc,346
File tested,Execution Time (ms)
HTMLImage.odt,199
File tested,Execution Time (ms)
skipimage-embedded-document.docx,315
File tested,Execution Time (ms)
skipimage-embedded.doc,382
File tested,Execution Time (ms)
textAndImage.docx,171
File tested,Execution Time (ms)
textAndImage.docx,86
File tested,Execution Time (ms)
charborder.odt,94
File tested,Execution Time (ms)
charborder.odt,232
File tested,Execution Time (ms)
fdo86857.html,43
File tested,Execution Time (ms)
fdo86857.html,134
File tested,Execution Time (ms)
fdo62336.docx,336
File tested,Execution Time (ms)
fdo81276.html,17
File tested,Execution Time (ms)
fdo81276.html,93
xmltesttools.cxx:53:Assertion
Test name: testExportUrlEncoding::Import_Export
equality assertion failed
- Expected: 1
- Actual  : 2
- In <file:///home/[..]/tmp/luup6jlk.tmp>, XPath '/html/body/p/a' number of nodes is incorrect

Failures !!!
Run: 15   Failure total: 1   Failures: 1   Errors: 0

Error: a unit test failed, please do one of:

export DEBUGCPPUNIT=TRUE            # for exception catching
export CPPUNITTRACE="gdb --args"    # for interactive debugging on Linux
export CPPUNITTRACE="\"[full path to devenv.exe]\" /debugexe" # for interactive debugging in Visual Studio
export VALGRIND=memcheck            # for memory checking

and retry using: make CppunitTest_sw_htmlexport

/home/[..]/lo44/solenv/gbuild/CppunitTest.mk:81: recipe for target '/home/[..]/lo44/workdir/CppunitTest/sw_htmlexport.test' failed
make[1]: *** [/home/[..]/lo44/workdir/CppunitTest/sw_htmlexport.test] Error 1
make[1]: *** Attente des tâches non terminées....
Makefile:237: recipe for target 'build' failed
make: *** [build] Error 2

It was an incremental build. Is it useful to retry with a complete rebuild ?

Best regards. JBF
Comment 13 Commit Notification 2015-02-07 10:34:58 UTC
Andras Timar committed a patch related to this issue.
It has been pushed to "libreoffice-4-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=72812676b419b8d5fa32d06e5c45af73d08eac59&h=libreoffice-4-4

tdf#76291 adapt unit test to libreoffice-4-4

It will be available in 4.4.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 14 Jean-Baptiste Faure 2015-02-07 12:46:21 UTC
(In reply to Commit Notification from comment #13)
> Andras Timar committed a patch related to this issue.
> It has been pushed to "libreoffice-4-4":
> 
> http://cgit.freedesktop.org/libreoffice/core/commit/
> ?id=72812676b419b8d5fa32d06e5c45af73d08eac59&h=libreoffice-4-4
> 
> tdf#76291 adapt unit test to libreoffice-4-4

Thank you, that solved the build problem.

Best regards. JBF
Comment 15 Robinson Tryon (qubit) 2015-12-15 22:51:38 UTC
Migrating Whiteboard tags to Keywords: ( EasyHack SkillCpp )
[NinjaEdit]
Comment 16 Robinson Tryon (qubit) 2016-02-18 16:37:19 UTC
Remove LibreOffice Dev List from CC on EasyHacks
(curtailing excessive email to list)
[NinjaEdit]