Description: see steps to reproduce Steps to Reproduce: 1. open the attached file in excel 2. open the attached file in libreoffice 3. compare the result Actual Results: see the attached screenshots. There is a letter 'あ' (U+3042) in Excel, but not in Calc Expected Results: Calc shows 'あ' Reproducible: Always User Profile Reset: No Additional Info: I personally applied the following patch on my local build to avoid this issue. diff --git a/oox/source/vml/vmlinputstream.cxx b/oox/source/vml/vmlinputstream.cxx index 93204ac50710..b41e697ab5c0 100644 --- a/oox/source/vml/vmlinputstream.cxx +++ b/oox/source/vml/vmlinputstream.cxx @@ -42,7 +42,7 @@ const char* lclFindCharacter( const char* pcBeg, const char* pcEnd, char cChar ) bool lclIsWhiteSpace( char cChar ) { - return cChar <= 32; + return 0 <= cChar && cChar <= 32; } const char* lclFindWhiteSpace( const char* pcBeg, const char* pcEnd ) @@ -268,7 +268,7 @@ constexpr OStringLiteral gaClosingCData( "]]>" ); InputStream::InputStream( const Reference< XComponentContext >& rxContext, const Reference< XInputStream >& rxInStrm ) : // use single-byte ISO-8859-1 encoding which maps all byte characters to the first 256 Unicode characters - mxTextStrm( TextInputStream::createXTextInputStream( rxContext, rxInStrm, RTL_TEXTENCODING_ISO_8859_1 ) ), + mxTextStrm( TextInputStream::createXTextInputStream( rxContext, rxInStrm, RTL_TEXTENCODING_UTF8 ) ), maOpeningBracket{ '<' }, maClosingBracket{ '>' }, mnBufferPos( 0 ) @@ -378,12 +378,12 @@ void InputStream::updateBuffer() OString InputStream::readToElementBegin() { - return OUStringToOString( mxTextStrm->readString( maOpeningBracket, false ), RTL_TEXTENCODING_ISO_8859_1 ); + return OUStringToOString( mxTextStrm->readString( maOpeningBracket, false ), RTL_TEXTENCODING_UTF8 ); } OString InputStream::readToElementEnd() { - OString aText = OUStringToOString( mxTextStrm->readString( maClosingBracket, false ), RTL_TEXTENCODING_ISO_8859_1 ); + OString aText = OUStringToOString( mxTextStrm->readString( maClosingBracket, false ), RTL_TEXTENCODING_UTF8 ); OSL_ENSURE( aText.endsWith(">"), "InputStream::readToElementEnd - missing closing bracket of XML element" ); return aText; }
Created attachment 177020 [details] the document to be used for STR
Created attachment 177021 [details] Excel screenshot
Created attachment 177022 [details] Calc screenshot
REPRODUCIBLE with reporter's sample document Installation of Version 7.2.4.1 (x64) / LibreOffice Build 27d75539669ac387bb498e35313b970b7fe9c4f9 CPU threads: 12; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win; Locale: de-DE (de_DE); UI: de-DE; Calc: threaded; Elementary Theme; My normal User Profile: Opened document does not show a character in the button. Additional information: a) SoftMaker PlanMaker does show the character b) I can't tell whether there might be a DUPlicate.
Already broken in: Version: 6.0.0.0.alpha1+ Build ID: 6eeac3539ea4cac32d126c5e24141f262eb5a4d9 CPU threads: 8; OS: Linux 5.14; UI render: default; VCL: gtk3; Locale: zh-CN (zh_CN.UTF-8); Calc: group threaded
Imho, we could add the fixes for the inputstreams since the xml will be saved using utf8 encoding (<?xml version="1.0" encoding="UTF-8" standalone="yes"?>). So we should read using the fixed utf8 encoding like proposed in your patch. However, if I change the text of the button from あ to ああ and save it, the xml will be changed (<a:t>あああああ</a:t>), but the button is missing after reopening the file.
The last time I encountered something like this I also assumed it was this encoding thing, but it wasn't and the problem was as fixed with https://cgit.freedesktop.org/libreoffice/core/commit/?id=b320ef30977144c52de9b39bc4db0db540727c79 So, does this problem persist after that fix?
It opens correct now.