Bug 159018 - Special characters changed with new version
Summary: Special characters changed with new version
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.6.4.1 release
Hardware: Other Windows (All)
: medium normal
Assignee: Mike Kaganski
URL:
Whiteboard: target:24.8.0 target:7.6.5 target:24....
Keywords: bibisected, regression
Depends on:
Blocks: Special-Character
  Show dependency treegraph
 
Reported: 2024-01-04 12:29 UTC by Angelo
Modified: 2024-01-09 14:10 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
it's a Calc with the checkmarks characters created with a version prior 7.5 so you can use to check the bug (19.81 KB, application/vnd.oasis.opendocument.spreadsheet)
2024-01-04 12:34 UTC, Angelo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Angelo 2024-01-04 12:29:40 UTC
Description:
I upgraded first to 7.5.9 and then to 7.6.4, but in both cases in FILEOPEN the previous files created with calc the special characters in the cells were changed and converted to the main font style of the cell.
For example the check mark (entered with webdings) is converted to an ì (arial which is the font of the row/cell)
Is there a solution so that I do not have to re-enter all the ticks again?

Steps to Reproduce:
1.Create a file with a version of Calc prior to 7.6 or 7.5
2. Add a checkmark from webdings and the rest of the text write in the same cell with arial font. Save it
3. Save it and reopen with a version of Calc 7.5 or 7.6.4 and you will see.

Actual Results:
The checkmarks were all converted in the character arial font "ì"

Expected Results:
 Checkmark from webdings


Reproducible: Always


User Profile Reset: Yes

Additional Info:
I also reset the user profile
Comment 1 Angelo 2024-01-04 12:34:37 UTC
Created attachment 191759 [details]
it's a Calc with the checkmarks characters created with a version prior 7.5 so you can use to check the bug

it's a Calc with the checkmarks characters I created with a version prior 7.5 so you can use it to check the bug I submitted. If you open it with a a 7.5 or higher version you will see the whole checkmarks converted in "ì"
Comment 2 Mike Kaganski 2024-01-04 13:44:18 UTC
(In reply to Angelo from comment #0)
> For example the check mark (entered with webdings) ...
> 
> Steps to Reproduce:
> ...
> 2. Add a checkmark from webdings and the rest of the text write in the same
> cell with arial font. ...

How exactly do you enter/add "checkmark from webdings" (in fact - U+F0FC from Wingdings)?
The problem in your file is, that the checkmarks are *not* marked with some specific font. The red characters use Liberation Serif font; and that indeed requires a substitution - i.e., looking for *some* font on the system, which could provide a glyph for the codepoint. This process is *not* guaranteed to be stable - it will *definitely* depend on the installed font set (e.g., will give different glyphs on macOS or Android - though here, it's not *this* case, since I see the same as you: on my Windows 11, LO 7.4 shows checkmarks, and 7.6 shows ì); also it is OK when this algorithm changes in a newer versions, which could result in a different font chosen for substitution.

But if you explicitly select the characters in cells, and mark them as Wingdings, they save and open correctly.

I am inclined to say that this is not a bug. A file with "I don't care which font this symbol is taken from" is subject to changes. It would still be interesting to bisect to which specific change in 7.5 cycle caused this (there is a chance that this is still an unexpected side effect, which could possibly be fixed)...
Comment 3 Angelo 2024-01-04 14:48:21 UTC
Hi Mike,

Thanks for replying to my question. I'm not a developer so I'm not used to know what happen behind the UI.
Anyhow, 
1-I inserted the Checkmark using the Omega button in the UI for inserting special characters searching among available font type, selecting from Webdings and not via code.
2-Then I copied this sign anytime and add in the row I needed it. That's the way I used to insert the Checkmark. Sometimes I inserted time later I compiled the Calc file and it was working fine.
3- I suppose it's a bug because the Checkmark is not detected and substituted by another character. The text in my row were Arial and not Open Serif.

4- I know that if I re-add  the mark and save it open normally but this is a suggestion for Barbie toys people. I cannot lose my time replacing all the marks whenever I open an old file.

5- If for you is OK this behaviour then you can close the case and amen. But for me still is a bug in this or a previous version something wan't working correctly.
Comment 4 Mike Kaganski 2024-01-04 15:32:05 UTC
(In reply to Angelo from comment #3)
> 1-I inserted the Checkmark using the Omega button in the UI for inserting
> special characters searching among available font type, selecting from
> Webdings and not via code.

This would be the bug. Our Special Character dialog should put the font along with the inserted character. However, I can't confirm this, too: inserting the character using the "Omega" button puts the font name to the inserted character OK - not only in 7.6, but also in 7.4.

Maybe you cleared direct formatting of the cells? Then it would explain the situation - it would clear all directly applied font information.
Comment 5 Mike Kaganski 2024-01-04 15:38:06 UTC
(In reply to Mike Kaganski from comment #4)
> However, I can't confirm this, too:

I mean, I can't confirm the bug of not setting font correctly for the inserted character. As I write next, I see it working as expected, setting font OK.
Comment 6 Angelo 2024-01-04 15:54:28 UTC
Hi Mike,
I'm sorry if you misunderstood my comment.

I have used such checkmark in this way for about three years with the different version prior 5.x

I changed the default font from Open Serif to Arial and then I wrote inside the cell.
2- After completing this kind of planner I inserted the mark the first time with the omega button then copying the single mark and paste wherever was needed.

This was my method. It worked fine so far I changed to the new version.
Maybe something is wrong in this way of working but it was easy and simple to do.
Comment 7 ady 2024-01-04 18:58:30 UTC
FWIW, for me, on MS Windows opening attachment 191759 [details] from comment 1 with:

* LO 7.5.3.2 shows the check mark
* LO 7.6.3.2 shows a different character
* AOO 4.1.14 shows a different character (which looks the same as in LO 7.6.3.2).

I have not tested LO versions between 7.5.3.2 and 7.6.3.2. This range might give a clue for bibisecting.

Using Calc's UNICODE() function in order to identify the character seems useless in this case. Anyway, using UNICODE() on the character, it shows decimal 61692, hex F0FC (part of the "Private use" range). In the Insert Special Characters dialogue for Arial, and for Liberation Serif fonts, such Unicode (61692) is not included.

Using LO 7.6.3.2, I copied the character to another workbook. On the cell where the character was pasted, changing the font to Wingdings shows the check mark again (instead of the other character).

Whether the specific font is actually set within the (part of the) cells for the specific inserted character, I have no idea.

With LO 7.6.3.2, I can save a new file containing the Windings character mixed with some other characters using a different font in the same cell. Opening it shows the same Windings character. This suggests that the problem shows up when opening a file that was saved with an older version.

It looks as some kind of font replacement change, at some point after LO 7.5.3.2.

Considering prior comments, I am unsure which exactly is the bug, or whether there is one. If it wasn't for those comments, I would be setting this report to NEW.

At any rate, from the POV of the user, the file (as-is) seems not future-proof. Even re-inserting the check mark character anew, is there any way to avoid this problem happening again (perhaps at some additional change in fonts replacement)?
Comment 8 ady 2024-01-04 19:42:51 UTC
(In reply to ady from comment #7)
> FWIW, for me, on MS Windows opening attachment 191759 [details] from comment
> 1 with:
> 
> * LO 7.5.3.2 shows the check mark
> * LO 7.6.3.2 shows a different character
> * AOO 4.1.14 shows a different character (which looks the same as in LO
> 7.6.3.2).
> 
> I have not tested LO versions between 7.5.3.2 and 7.6.3.2. This range might
> give a clue for bibisecting.

* With LO 3.3 and up to 5.0 (included), the check mark is not shown; there is some other character instead. The precise (replacement) character is not always the same (it may vary depending on version).

* With LO 5.1 and up to 7.5.3.2, the check mark is shown.

* With LO 7.6.3.2 and up to a recent 24.8 alpha, a different character is shown.

Regression?

Should this report be set as NEW?
Comment 9 Mike Kaganski 2024-01-05 04:09:20 UTC
(In reply to Angelo from comment #6)
> I'm sorry if you misunderstood my comment.

I understood your comment.
Please try this procedure yourself:

1. In a new clean document, change a cell's font to Arial. Alternatively, change default cell style's font to Arial (for the test, it doesn't really matter);
2. Open the Special Character dialog ("Omega" toolbar button, r Insert->Special Character);
3. Select Wingdings font, scroll down till the checkmark, double-click it;
4. Now, when the cell contains the checkmark character, select the character in the cell, and look at the displayed font name.

The #4 shows Wingdings, which illustrates my point: our tool not only inserts the character, but also formats it explicitly, as using a specific chosen font.

5. Change another document's cell to Arial;
6. In initial document, select the checkmark (either in cell's edit mode, or the whole cell), and copy to clipboard (Ctrl+C);
7. Paste into the second document (either in cell's edit mode, or over the whole cell).

#7 will have all the pasted instances of checkmark to have explicit Wingdings font applied, making this character survive save/reload, version upgrade, etc.

My point is: this worked this way (explicitly marking the special character's font) all the way down, even in LO 3.3.

Now open your attachment 191759 [details], and inspect your checkmark characters - select them individually, and see the reported font. See that they have a *different* font - namely, Liberation Serif.

This shows, that your procedure, or your workflow, included not only the steps you mentioned in comment 3 and comment 6, but also *something* which broke the explicit formatting of this character with the font. Multiple things may do that: clearing cells' direct formatting; saving to file formats that don't support partial cell formatting (like XLS(X)); using special paste (text only); pasting to e.g. Notepad as intermediate place, then copying from there ... only you can (possibly) tell what was that. But all that is not the program's fault. As soon as user's actions removed the explicit "this is Wingdings!" mark from the character, these characters become subject to the pure luck of accidentally choosing the expected font (or not).

As I said, the real bug could be, if Calc would *not* itself mark the inserted character using correct font initially. Also, a bug would be, if normal paste of a copied correctly formatted character would drop the formatting. Another possible bug could be, if saving to ODS would drop the formatting.

But I don't see any of the three in the versions I tested so far. The only phenomenon I observe is "expected" (accidental) choice of *fallback* Wingdings font for your (unformatted!) checkmarks in 7.4 and some earlier, and (normal, not a bug) choice of a different fallback font in 7.5.0+.

(In reply to ady from comment #8)

> Regression?

This is already answered in comment 2. This doesn't looks like a regression. I marked it "needing bisection", to learn which exact commit created the fallback font choice difference - not because it looks like a bug, but because it *could* be an unintended side effect, in which case there could be a potential fix.

> Should this report be set as NEW?

No until either:
1. A bisection proves that the change was unwanted side effect;
2. Angelo provides steps to show how LibreOffice looses the font information.
Comment 10 Angelo 2024-01-05 07:08:04 UTC
I know that everyone of you seems to be an engineer or some experienced developer but I'm a common user and that I want to be.

My procedure was what I wrote earlier and I don't like to be named a liar or stupid.

If it is not a bug close my report and amen, otherwise don't keep on going to tell me new procedure or whatever else you dream in the night.

1-I changed the default font to Arial because I hate that other font
2 I wrote whatever I wanted
3-I saved in this automatic format ods, I suppose is
4-Later on I did the work and decided to add a red bold checkmark to know the work had been done.
5-When I made some other work I copied the full row of cells and add to the plot.
6-If I needed more checkmarks I copied (CTRL+C) from one cell and added into another.
That's all, no other copy and paste, neither notepad++ nor any other bla bla.
Comment 11 Mike Kaganski 2024-01-05 08:35:55 UTC
(In reply to Angelo from comment #10)

Thank you for your bug report.

No one tries to "name you a liar or stupid". As a user, you are not required to remember every step you did with your document throughout its history. When I write that you definitely did something else, it is natural, and not an insult. It just says: there is something additional, that *we* need to find out, to be able to proceed. As you are the reporter, you might be the only person able to help us; but if it's not possible, it's OK when you say "I don't know / can't be bothered".

The report is not *yours* after you filed it. Now we try to find out if it's actionable. Even if you stop paying attention to it, we still look at it, and think how to go forward. I asked for a procedure called bisection, because it *might* shed a new light on it. Also, is later someone else comes with the same problem, and adds more details about how it happened, we might be also able to move it forward.

So please don't act as if you are offended. The bug report is an important contribution, and we appreciate it. If you taken something written here as a personal insult, then you simply had wrong expectations / developed emotions based on misunderstanding.
Comment 12 Angelo 2024-01-05 09:03:37 UTC
I understand it.
Thanks a lot and I wish my report can be useful in future.
Comment 13 V Stuart Foote 2024-01-05 14:47:42 UTC
(In reply to Angelo from comment #12)
> I understand it.
> Thanks a lot and I wish my report can be useful in future.

Another alternative is to avoid using "symbol" fonts like Wingdings that draw from the Private Use Area and that PUA code point gets misinterpreted as to font.  

So, keep handy these Unicode values "2713", "2714" or even "1F5F8".

   U+2713 CHECK MARK
   U+2714 HEAVY CHECK MARK
   U+1F5F8 LIGHT CHECK MARK

Those are the Unicode mappings for the common checkmarks, and on Windows they are found in multiple fonts notably the Segoe UI Symbol font for all three. So always available to mark as a favorite (the split button listing on the "Omega" key).

You can also enter the Unicode directly via toggle anywhere in the LibreOffice UI of typing the code as "U+2714" and then toggle with and <Alt>+x (or <Alt>+c for some locales). Or you can use the <Alt>+x in most places to look up the Unicode for a character by reversing the toggle.

Otherwise, I'd kind of agree that a shifting font when a character was picked from a PUA area is not a bug, just avoid the practice.
Comment 14 Angelo 2024-01-05 17:38:05 UTC
Thanks a lot, I'll keep it in mind.
Comment 15 ady 2024-01-05 22:40:07 UTC
(In reply to V Stuart Foote from comment #13)
> Otherwise, I'd kind of agree that a shifting font when a character was
> picked from a PUA area is not a bug, just avoid the practice.

It might or might not be a bug, strictly speaking, but common users perform this kind of things anyway.

Considering that it (just happened to have) worked (from the POV of users) for so long (LO 5.1 until at least recently, in 7.5.3.2 included), perhaps it would be worth trying to "solve" this issue, independently of whether this is really a bug or not.

As for the bibisectRequest in the Keywords filed, I guess it would be more attractive if the report were to be set as NEW (giving that at least the resulting behavior _is_ reproduced). By leaving the report as UNCONFIRMED, the chances it will be bibisected are lower.
Comment 16 Mike Kaganski 2024-01-06 04:26:36 UTC
Regression after db04b3e154a1fb8f222232ef969bb3617e051329 (return 64-bit hash for O[U]String, 2022-08-22). Given that the change is 100% unrelated to fallback font choice, the change of behavior is unintended. Marking it NEW.
Comment 17 ady 2024-01-06 06:22:38 UTC
(In reply to Mike Kaganski from comment #16)
> Regression after db04b3e154a1fb8f222232ef969bb3617e051329 (return 64-bit
> hash for O[U]String, 2022-08-22).

CC'ing Noel.
Comment 18 Mike Kaganski 2024-01-06 07:53:46 UTC
WinGlyphFallbackSubstititution::FindFontSubstitute now finds Symbol. Indeed, it has the same character repertoire as Wingdings; and indeed, it's its glyph which gets shown now.

It is curious, if the function's "are the missing characters symbols?" check can be improved - it returns OpenSymbol, which doesn't include the character in question.

However ... why would Wingdings be preferred over Symbol in any case? Both are symbol fonts; both have glyphs for the wanted character ... And Symbol is lexicographically first. It seems like with previous hash implementation, Wingdings accidentally came in front of Symbol - but it's just luck.

My current idea is: why do we use fallbacks at all? Can we instead use "no glyph" placeholder unconditionally?... Sigh, not realistic, but that would prevent creation of the problem in the first place.
Comment 19 Mike Kaganski 2024-01-06 09:53:41 UTC
https://gerrit.libreoffice.org/c/core/+/161706

And indeed, that was just pure luck. No idea if reverting this fragile thing makes any sense. As the discussion here shows (comment 7), this wasn't the first time this changed; and now, the proposed change would likely cause a *similar* breakage of documents created in 7.5 - 7.6. Will it be merged? No idea, and no preference; the proposed patch is not better that the current status, TBH.
Comment 20 Commit Notification 2024-01-07 13:50:53 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/4eeb6178fb9fb499bc417a42f8d6d0bdde9acb8e

tdf#159018: make 64-bit hash algorithm similar to 32-bit one

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Commit Notification 2024-01-09 14:10:30 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "libreoffice-7-6":

https://git.libreoffice.org/core/commit/02ffa01b5e2e7697d6b386419e88a9b8910ad31f

tdf#159018: make 64-bit hash algorithm similar to 32-bit one

It will be available in 7.6.5.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 22 Commit Notification 2024-01-09 14:10:33 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "libreoffice-24-2":

https://git.libreoffice.org/core/commit/bc7ea997d5f6bb5a185fed76927175a53b87f7fc

tdf#159018: make 64-bit hash algorithm similar to 32-bit one

It will be available in 24.2.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.