Bug 142487 - REPLACE: Wrong replacement in long text
Summary: REPLACE: Wrong replacement in long text
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: BASIC (show other bugs)
Version:
(earliest affected)
7.0.0.3 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.3.0 target:7.2.0.2
Keywords: bibisected, bisected, regression
Depends on:
Blocks: Macro-StarBasic
  Show dependency treegraph
 
Reported: 2021-05-25 18:58 UTC by Robert Großkopf
Modified: 2021-07-13 15:09 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Extract the *.zip-file. Contains database an example text. (12.10 KB, application/zip)
2021-05-25 18:58 UTC, Robert Großkopf
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Großkopf 2021-05-25 18:58:31 UTC
Created attachment 172338 [details]
Extract the *.zip-file. Contains database an example text.

Download the attached package. Contains a database-file and a text-file.
Allow executing macros for this.
Open the database.
Open the form and have a look at the text. 
There are two positions marked in English with "→ have a look …"
Same text you could see above.
The macro should replace the "German" quotation marks.

Now press the button.
Quotation  marks at the top of the text will be removed right.
Quotation marks at the bottom will be removed wrong. There will be removed the following characters instead of the quotation marks.

This bug appears here with LO 7.0.5.2, also with LO 7.1.3.2. It won't appear with LO 6.4.7.2 on the same machine.
Tested with OpenSUSE 15.2 64bit rpm Linux.
Comment 1 Michael 2021-05-26 05:48:34 UTC Comment hidden (obsolete)
Comment 2 Michael 2021-07-08 06:40:47 UTC
This bug is connected with version 7: It occurs with version 7.0.5.2, as I described above. It does NOT occur with
Version: 6.4.7.2
Build-ID: 639b8ac485750d5696d7590a72ef1b496725cfb5
CPU-Threads: 16; BS: Linux 5.10; UI-Render: Standard; VCL: kf5; 
Gebietsschema: de-DE (de_DE.UTF-8); UI-Sprache: de-DE
Calc: threaded

It is important to fix this, otherwise BASE users will not be able to use version 7.x or face data loss.
Comment 3 Aron Budea 2021-07-11 05:48:39 UTC
Bibisected to the following commit using repo bibisect-linux-64-7.0. Adding CC: to Andreas Heinisch.

https://cgit.freedesktop.org/libreoffice/core/commit/?id=3ff159d35770ac3454ee909b348cb4f4ca8b0b9b
author		Andreas Heinisch <andreas.heinisch@yahoo.de>	2020-05-20 15:49:08 +0200
committer	Noel Grandin <noel.grandin@collabora.co.uk>	2020-05-21 08:50:23 +0200

tdf#132389 - case-insensitive operation for non-ASCII characters
Comment 4 Andreas Heinisch 2021-07-11 09:17:31 UTC
The bibisected bug was fixed in: https://bugs.documentfoundation.org/show_bug.cgi?id=141045
Comment 5 Robert Großkopf 2021-07-11 09:39:23 UTC
(In reply to Andreas Heinisch from comment #4)
> The bibisected bug was fixed in:
> https://bugs.documentfoundation.org/show_bug.cgi?id=141045

But this buggy behavior still exists in LO 7.1.5.1 on OpenSUSE 15.2.

Bug 141045 has been fixed for LO 7.1.2.
Comment 6 Andreas Heinisch 2021-07-11 09:48:33 UTC
However, the error lies in this fix.

Sry Mike for the noise, but could this be some kind of Unicode problem? Since the aExpStr and the aSrcStr have the same length at the beginning. After the toUpper-replacement they differ, hence this error. 

Is there a way to ensure they have the same length and characters even after the toUpper?
Comment 7 Mike Kaganski 2021-07-11 11:30:25 UTC
So basically, the minimal reproducer is:

Sub ReplaceGermanQuotes
	Dim s$
	s = "Replace something after the 'ß': x y"
	s = Replace(s, "x", "y")
	MsgBox s
End Sub

Indeed, 'ß' is uppercased to a two-character 'SS'.
The correct way is shown in TextSearch::searchForward in i18npool/source/search/textsearch.cxx. It creates an array of matches from original character positions to the transliterated character positions.

Possibly we could reuse utl::TextSearch::SearchForward from unotools/source/i18n/textsearch.cxx?
Comment 8 Andreas Heinisch 2021-07-11 13:24:35 UTC
I will test if the InStr function is affected as well (In Bug 139840)
Comment 9 Robert Großkopf 2021-07-11 13:38:40 UTC
(In reply to Mike Kaganski from comment #7)
> So basically, the minimal reproducer is:
> 
> Sub ReplaceGermanQuotes
> 	Dim s$
> 	s = "Replace something after the 'ß': x y"
> 	s = Replace(s, "x", "y")
> 	MsgBox s
> End Sub
> 
> Indeed, 'ß' is uppercased to a two-character 'SS'.

… but there is no 'ß' in the long text of the example. The effect indeed is the same.
Comment 10 Aron Budea 2021-07-11 14:02:35 UTC Comment hidden (obsolete)
Comment 11 Robert Großkopf 2021-07-11 14:14:51 UTC
(In reply to Aron Budea from comment #10)
> (In reply to Robert Großkopf from comment #9)
> > … but there is no 'ß' in the long text of the example. The effect indeed is
> > the same.
> There is: "eines großen Sprachozeans"

You are right! And when I set this to 'ss' instead of 'ß' it will work.
Comment 12 Commit Notification 2021-07-12 18:31:10 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/7e5c9220ef5d51ac23e618c5c9eeda9cf4339c88

tdf#142487 - use utl::TextSearch in order to implement the replace algorithm

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 13 Commit Notification 2021-07-13 10:46:07 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "libreoffice-7-2":

https://git.libreoffice.org/core/commit/15e4d775f7c2edbafca04ade1d293ce71045e9e7

tdf#142487 - use utl::TextSearch in order to implement the replace algorithm

It will be available in 7.2.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.