Description: When pasting UTF-8 external simple text into Calc, if the text has linebreaks as Windows-style newlines CR+LF (0x0D0A), the text is incorrectly detected as UTF-16. If the character encoding is manually set to UTF-8, then Calc inserts an empty line after every line. Steps to Reproduce: 1. Generate some text in any text application where you can be sure that newlines are Windows-style CRLF, and that the encoding is UTF-8, e.g. Notepad++ 2. Copy that text, ensuring it has several lines. 3. Paste it over any Calc cell, to open the text import window. 4. Notice the character encoding detected by Calc is UTF-16, not UTF-8. 5. Change the encoding to the correct UTF-8. 6. See how in the preview every even row is empty. Actual Results: If you don't touch anything and the text doesn't include any UTF-8 character combinations which translate to an UTF-16 character, nothing happens. However until I noticed this problem, sometimes I had strange characters in my text imports, which might have been caused by this problem. I can't provide any specific text string to trigger that problem but I am pretty sure it exists. Expected Results: The text should correctly be detected as UTF-8 to avoid potential problems, and when doing so, the CRLF combination shouldn't be interpreted as two consecutive newlines. Reproducible: Always User Profile Reset: No Additional Info: This bug is not triggered when opening a file with the same text, UTF-8 with CRLF. User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0
Thank you for reporting the bug. Please attach a sample document, as this makes it easier for us to verify the bug. On the other hand, it seems you're using an old version of LibreOffice. Could you please try to reproduce it with the latest version of LibreOffice from https://www.libreoffice.org/download/libreoffice-fresh/ ? I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' if the bug is still present in the latest version.
Created attachment 131367 [details] Sample file containing UTF-8 text with CRLF newlines To trigger the bug correctly, open this file with a text editor capable of maintaining both the UTF-8 encoding and the CRLF newline style. Copy the contents and paste them into any Calc cell to open the text import window. You can also see the difference in encoding detection if you try to open the file from Calc, and you see the text import window but with the data coming from a file instead of from the clipboard.
Yes, I noticed I was a bit behind in updating while filing the bug report. I updated to last stable 5.2.5.1 (I've had a few probs with fresh releases so I stick to stable ones), reproduced the bug again, and added a sample text. I changed the version number and the status to unconfirmed until someone else can reproduce the bug. Note that the bug only triggers when pasting the text from the clipboard, not when importing from a file, which I find weird.
Reproduced. Tried with v. 3.6, but it does not separate the utfs, only says "Unicode". Arch Linux 64-bit, KDE Plasma 5 Version: 5.4.0.0.alpha0+ Build ID: ed0e8f970ff552e75222dc92ed2879aa3b3e5851 CPU threads: 8; OS: Linux 4.9; UI render: default; VCL: kde4; Locale: fi-FI (fi_FI.UTF-8); Calc: group Built on March 4th 2016
** Please read this message in its entirety before responding ** To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
I tried with 5.4.7_Win_x64 with the attachment I made a year ago, ANSI encoding with Windows CRLF endlines. The results are the same, the bug is still there. Incorrect detection of the encoding, even if there are no characters that can be interpreted as UTF-16. Note that the extra added empty lines are shown only in the import panel; if applying the import, the text is imported correctly. It's only in the import panel where the problem lies. However as I said in my initial bug report, working with characters beyond the 7-bit ASCII gave me problems with importing, and rendered strange characters, so it's not only a cosmetic bug. There's something wrong in the detection of the encoding, and in how LibreOffice reencodes such text for internal usage.
Are you really sure that the encoding is different from UTF-16 without CRLF and with CR or LF oly? https://opengrok.libreoffice.org/xref/core/sc/source/ui/dbgui/scuiasciiopt.cxx?r=a5c04cbf#380 https://opengrok.libreoffice.org/xref/core/sc/source/ui/view/viewfun5.cxx?r=18a8cac5#346 It looks to me that these code suggests, regardless of Whatever encoding and line-separator we use, pasting will open dialog with UTF-16 set as encoding. It also looks to me whatever encoding you use on your text editor doesn't affect in what encoding Windows stores string data to its clipboard.
typo: oly => only
I have tried the attached text with NotePad++, NotePad, PSPad and with Wordpad. All of them put more than one type into the clipboard. The type "Unicode Text Format" is provided by all of these apps. That is UTF-16. Then I have changed the line ends to LF. Again copying results in a type "Unicode Text Format" in the clipboard. If LibreOffice takes this clipboard flavor, the selection UTF-16 in the dialog is correct. I have used "Free Clipboard Viewer 3.0" to examine the clipboard. Do you have got an application, which does not put "Unicode Text Format" into clipboard?
Hmm, it's true that there is something weird about the attachment. Linux command "file" says: sample utf-8 crlf text.txt: ASCII text, with CRLF line terminators However, Kate editor opens it as UTF-8.
If I use the program enca like this: enca -L none sample.txt It says: 7bit ASCII characters CRLF line terminators
>sample utf-8 crlf text.txt: ASCII text, with CRLF line terminators However, Kate editor opens it as UTF-8. If all the characters are in U+0000 to U+00FF, US-ASCII and UTF-8 are completely identical. so there is nothing strange in this behavior.
Dear JMM, To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
I suggest to close it NOTABUG. This is something about wrong expectations. OP expects that when pasting, the text has the same encoding as when it's copied in original program. But that's incorrect, as himajin100000 and Regina rightfully note in comment 7 and comment 9. The expectation that changing UTF-16 to UTF-8 on paste (step 5 in comment 0) would result in "correct" behaviour is also wrong, and step 6 shows that. The bottom line in the description was: > If you don't touch anything and the text doesn't include any UTF-8 character > combinations which translate to an UTF-16 character, nothing happens. > However until I noticed this problem, sometimes I had strange characters in > my text imports, which might have been caused by this problem. I can't provide > any specific text string to trigger that problem but I am pretty sure it exists. ... which is mixing two completely unrelated things: OP has some unspecified problem, and suspects that it has something with the observed inconsistencies between OP's expectations and the correct behaviour. The real problem is completely unrelated.
Thanks, let's close