Description: When exporting the document into xhtml format, almost all the tags, including <!DOCTYPE>, <html>, <head>, <meta>, <body>, ... etc don't have a newline \n (and/or carriage return \r) after them, so most of the content are in one single line. It didn't affect the result in a browser, but a lot more difficult if we need to manually maintain the xhtml file. Steps to Reproduce: 1. Save a document (or export it) as a xhtml file 2. 3. Actual Results: Most of the content are in one single line. Expected Results: Should split into multiple lines by each tag. It's easier to maintain manually when necessary. Reproducible: Always User Profile Reset: No Additional Info: In 3.6.7.2 (版本 3.6.7.2 (組建 ID:e183d5b)) the tags and contents are split into multiple lines. Since 4.0.6.2 (版本 4.0.6.2 (組建 ID:2e2573268451a50806fcd60ae2d9fe01dd0ce24), the second oldest version I installed in my system) it started becoming in all single line.
Created attachment 185591 [details] Exported HTML file by 3.6.7.2. The contents are split into multiple lines.
Created attachment 185592 [details] Exported HTML file by 4.0.6.2 Most of the content are in one single line. No newline after most of the html tags.
Thanks Franklin. Confirmed, this has been bugging me for a while and I thought I had reported it bug I could not find it. Tagging as a regression. Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: 6d9b9d1228cdee69e767833202442a1fed6174a6 CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3 Locale: en-AU (en_AU.UTF-8); UI: en-US Calc: threaded
After digging in, exporting to xhtml is defined in filter/source/xslt/odf2xhtml/export/xhtml. I added <xsl:text>
</xsl:text> here and there in the opendoc2xhtml.xsl, body.xsl and header.xsl and can produce xhtml files with <head> section elements and each paragraph separated. But I think we'll need one who is expert or familiar with xslt syntax to review these xsl files and decide how to properly fix this.
Franklin Weng committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/d2e8705c9cc503afdaed366b1f71ed012b0c568f tdf#153839: add newline after certain tags It will be available in 7.6.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Franklin Weng committed a patch related to this issue. It has been pushed to "libreoffice-7-5": https://git.libreoffice.org/core/commit/5ee2f4ee7838401afdae5eef5669881601fb4ee6 tdf#153839: add newline after certain tags It will be available in 7.5.3. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Created attachment 186006 [details] test ODT to export as XHTML Thanks for this work, Franklin. I just tested and I think it's pretty closed to be resolved, the HTML source is a lot more readable now. I am attaching an example file that I used, to list a couple of extra tags that could be improved, if you feel like submitting follow-ups: - Comments of the type <!--Next 'div' was a 'text:p'.--> are either kept inline or breaking across multiple lines in a weird way - Table markup could be broken down better, as it overflows heavily. But this might also have to do with the filter creating unnecessary complicated table markup, I'm not sure. - Note also the closing </table> tag directly followed by <h1> without breaking. What do you think?
Created attachment 186007 [details] XHTML export of text document following first commit Exported with: Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: 44837a12d12be3e525fa48b37c3dd2553cc97d94 CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3 Locale: en-AU (en_AU.UTF-8); UI: en-US Calc: threaded
(In reply to Stéphane Guillou (stragu) from comment #7) > Created attachment 186006 [details] > test ODT to export as XHTML > > Thanks for this work, Franklin. > I just tested and I think it's pretty closed to be resolved, the HTML source > is a lot more readable now. > I am attaching an example file that I used, to list a couple of extra tags > that could be improved, if you feel like submitting follow-ups: > > - Comments of the type <!--Next 'div' was a 'text:p'.--> are either kept > inline or breaking across multiple lines in a weird way > - Table markup could be broken down better, as it overflows heavily. But > this might also have to do with the filter creating unnecessary complicated > table markup, I'm not sure. > - Note also the closing </table> tag directly followed by <h1> without > breaking. > > What do you think? Looks like a lot more complicated, but I think I can spend some time figuring it out and see if it could pass the unit test or not. However in the commit Miklos commented that: > in general the XSL-based XHTML export is horrible, you should never use it. Instead, you can use the XHTML mode of the C++-based HTML export, like: > soffice --convert-to "xhtml:HTML (StarWriter):XHTML" ... > I just note this because this change is easy enough to review, but if you would want nontrivial changes in this XSL mess, I won't be able to review. I guess so far we still need to stick with the XSLT solutions if C++ based HTML export could only used by command line, which doesn't make much sense for normal users. I can test it as well, though.
I tested https://gerrit.libreoffice.org/c/core/+/149280 and it looks good to me. The table code still spreads horizontally too much but that might have to do with the filter unnecessarily repeating tags, a different issue. Overall, this is a big improvement over the previous situation. Happy to have this marked as fixed once the second patch is merged. Thanks Franklin!
Franklin Weng committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/ce4272c25426f0084e53735e80870b9339239078 tdf#153839 : Further handling for adding newlines It will be available in 7.6.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
(In reply to Stéphane Guillou (stragu) from comment #10) > I tested https://gerrit.libreoffice.org/c/core/+/149280 and it looks good to > me. > The table code still spreads horizontally too much but that might have to do > with the filter unnecessarily repeating tags, a different issue. > Overall, this is a big improvement over the previous situation. Happy to > have this marked as fixed once the second patch is merged. > Thanks Franklin! Some places couldn't be fixed since when I tried to insert the line break there, it always caused unit test error. (For example, before <h1>) But let's live with this for now.
Verified with own build. Thanks again!
*** Bug 154268 has been marked as a duplicate of this bug. ***
Michael Stahl committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/63ac36893ad7f3b1c73cb46667fbfd5384a747dc tdf#153839 XHTML export: fix syntax error in table.xsl It will be available in 7.6.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Michael Stahl committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/ab85fd73a52256da6feb4fabd1b188f4f0fb7ce4 tdf#153839 XHTML export: do not add newlines to attribute values It will be available in 7.6.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Franklin Weng committed a patch related to this issue. It has been pushed to "libreoffice-7-5": https://git.libreoffice.org/core/commit/c910a1320c7247c111d4f7e2a61540fc646938ff tdf#153839 : Further handling for adding newlines It will be available in 7.5.4. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Michael Stahl committed a patch related to this issue. It has been pushed to "libreoffice-7-5": https://git.libreoffice.org/core/commit/fc4b4d007e41192c21d2979e45ac73541935c00e tdf#153839 XHTML export: fix syntax error in table.xsl It will be available in 7.5.4. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Michael Stahl committed a patch related to this issue. It has been pushed to "libreoffice-7-5": https://git.libreoffice.org/core/commit/35fe68188e984d32d3f21db81e633743ca06f67c tdf#153839 XHTML export: do not add newlines to attribute values It will be available in 7.5.4. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.