Description: I have tried exporting book-length material to EPUB from Writer and then editing it with Calibre. What happens is that every place I have edited my text in LO (and I edit a lot, because that is what writers do), I find a separate span. For example, if I change a word in a paragraph after I have written it, that word will be in its own span. That makes for very messy HTML that is hard to edit afterward. Thank you for your attention to this. I apologize for an amateur bug report. I think the fix should be fairly simple. Steps to Reproduce: 1. Create a document. 2. Go back and change text within the document. 3. Export to EPUB. Actual Results: Text within a paragraph, sentence, or even a single word is separated into multiple spans. A new span is created whenever even a single letter is changed. Expected Results: I would like every paragraph to be kept in one single span, as long as no styles are changed, that is, as long as it would be in its own single span if it had not been edited. Reproducible: Always User Profile Reset: No Additional Info: [Information automatically included from LibreOffice] Locale: en-US Module: TextDocument [Information guessed from browser] OS: Linux (All) OS is 64bit: yes
I guess what I am asking for, to simplify, is that adjacent identical tags would be merged. And that text that is of the same WYSIWYG style would also have the same tag style, and therefore be merged. This is asking LO to behave like an HTML cleaner, but it is also asking for the app to not create new spans where none are needed. At the risk of being annoying, this seems related to the observed behavior that LO likes to default to a basic text style that may or may not be the user-defined style. If I have changed my style (e.g. to 10 pt.) in Options, and perform an Undo operation, sometimes the changed text reverts to the basic out-of-the-box style (e.g. 12 pt.) instead of the user-defined style. It seems to me there should only be one place that styles are defined, to avoid confusion. Not asking for a separate bugfix, just trying to add helpful information that may help locate the source of this behavior.
I added "my" in the middle of a sentence and is creating a new span for this. ---- in </span><span class="span2">my</span><span class="span2"> the ---- <?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml"><head><link href="../styles/stylesheet.css" rel="stylesheet" type="text/css"/></head><body class="body0" xmlns:epub="http://www.idpf.org/2007/ops"><p class="para2"><span class="span1">Chapter 1</span></p><p class="para1"><span class="span2">He heard quiet steps behind him lying in </span><span class="span2">my</span><span class="span2"> the middle of the sidewalk. Would this door save his hide?</span></p><p class="para1"><span class="span2">Another paragraf</span></p><p class="para1"> </p></body></html> Confirm with Version: 7.1.5.2 / LibreOffice Community Build ID: 85f04e9f809797b8199d13c421bd8a2b025d52b5 CPU threads: 4; OS: Linux 5.8; UI render: default; VCL: gtk3 Locale: ro-RO (ro_RO.UTF-8); UI: en-US Calc: threaded
Created attachment 173984 [details] odt document in order to test this bug
*** Bug 147392 has been marked as a duplicate of this bug. ***
Dear Coburn Ingram, To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
Yes, this is still an issue in: Version: 24.2.1.2 (X86_64) / LibreOffice Community Build ID: db4def46b0453cc22e2d0305797cf981b68ef5ac CPU threads: 8; OS: Windows 10.0 Build 22631; UI render: Skia/Raster; VCL: win Locale: en-US (en_US); UI: en-US Calc: CL threaded - - - I tested using BogdanB's attachment 173984 [details] in comment 3. 0. Open file. 1. Add a random word or two inside the text. 2. File > Export As > Export as EPUB. 3. Press OK. 4. Unzip the EPUB and look inside the HTML. - Or use an EPUB editing program like Sigil or Calibre. You'll see extra <span>s in the EPUB: > Like lightning he darted off to the left and disappeared between the two warehouses almost falling over the trash can lying în </span><span class="span2">my</span><span class="span2"> the middle of the sidewalk. He tried to nervously tap his way along in </span><span class="span2">my </span><span class="span2">the inky darkness and suddenly stiffened: with this blank class in the EPUB's CSS: > .span2 { > } = = = = = = = = = = = = I believe part of the root cause is spurious: - officeooo:rsid inside the ODT file, which get carried over into the HTML/EPUB export. (I believe these RSIDs are "Random Session IDs"—to know when a certain text was edited for Comparison / Tracked Changes reasons.) - - - If you take the ODT and: - File > Save As - Dropdown for "Save as Type:" - Choose "Flat XML ODF Text Document" You can open the FODT up in a text editor and see code along these lines: > <text:p text:style-name="P1">He heard quiet steps behind him. [...] almost falling over the trash can lying în <text:span text:style-name="T1">my</text:span> the middle of the sidewalk. He tried to nervously tap his way along in <text:span text:style-name="T2">my </text:span>the inky darkness and suddenly stiffened: it was a dead-end, [...] where extra <text:span>s appear around everything you insert/edit. Higher in the FODT document, you can see what "T1" and "T2" were equivalent to: > <style:style style:name="T1" style:family="text"> > <style:text-properties officeooo:rsid="00019890"/> > </style:style> > <style:style style:name="T2" style:family="text"> > <style:text-properties officeooo:rsid="0003570a"/> > </style:style> The only thing these <text:span>s were there for was: - officeooo:rsid they didn't supply any other info. - - - There was a similar issue with "single URLs" getting split into "multiple identical ones" here: - Bug #112429 : "officeooo:rsid multiplies the links" - Bug #148198 : "Editing single hyperlink breaks it into smaller ones" - Which got fixed in 7.5.0 and 7.4.0.2. Mike Kaganski then came up with a patch to "merge identical hyperlinks of adjacent text ranges on ODF export": - https://bugs.documentfoundation.org/show_bug.cgi?id=148198#c19 = = = = = = = = = = = = So, on EPUB Export, I would probably do some logic along these lines: Case 1: Before - If ODT's "text:span text:style-name" only has "officeooo:rsid": - Do not export this <span> to EPUB at all. - If 2 "text:spans" are right next to each other and the only difference is "officeooo:rsid". - Merge them together before HTML/EPUB export. - Similar to Bug 148198 above! Case 2: After You could have a pass that says: - If the CSS class is empty/blank on the other end: - Delete that <span> out of the HTML/CSS/EPUB export completely. = = = = = = = = = = = = Note 1: Calibre's EPUB Editor has a fantastic feature called: - "Remove Unused CSS" - https://manual.calibre-ebook.com/edit.html#removing-unused-css-rules which can do this type of thing in one button push: - Tools > "Remove unused CSS" It: - Finds and purges all CSS and related HTML tags that that are blank / not in use making the leftover HTML *much* easier to work with. - - - Note 2: I've also written many topics about this type of HTML+CSS cleanup over the years. Most recently: 2023: "Nested span, clean" - https://www.mobileread.com/forums/showthread.php?p=4342160#post4342160 2023: "removing excessive <class> and other formatting horrors on epub" - https://www.mobileread.com/forums/showthread.php?p=4312194#post4312194 2022: "Convert text formating from CSS to HTML" - https://www.mobileread.com/forums/showthread.php?p=4188132#post4188132
*** Bug 161367 has been marked as a duplicate of this bug. ***
<span class="span0">My </span><span class="span0">NEW ADDED TEXT </span><span class="span0">book</span> The same using Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: 0e955c4b236bcf9e66e7b49cc3ae285f1a4a9e32 CPU threads: 16; OS: Linux 6.8; UI render: default; VCL: gtk3 Locale: ro-RO (ro_RO.UTF-8); UI: en-US Calc: threaded