Bug 169591 - FILEOPEN Content import from CSV does not work when file is encoded UTF-16BE
Summary: FILEOPEN Content import from CSV does not work when file is encoded UTF-16BE
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.0 all versions
Hardware: All Windows (All)
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: CSV-Import
  Show dependency treegraph
 
Reported: 2025-11-21 12:58 UTC by Hubert Englmaier
Modified: 2026-01-22 17:39 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Input file in UTF-16BE - cannot be imported (122 bytes, text/csv)
2025-11-21 13:05 UTC, Hubert Englmaier
Details
Same input file in UTF-8 - import works fine (80 bytes, text/csv)
2025-11-21 13:06 UTC, Hubert Englmaier
Details
Same input file in UTF-16BE, but with the BOM FE FF - import works fine (124 bytes, text/csv)
2025-11-21 13:19 UTC, Hubert Englmaier
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hubert Englmaier 2025-11-21 12:58:50 UTC
Description:
Opening a file of type CSV (comma separated values), encoded in UTF-16BE in Calc ends up in a one lined table with asian letters. The format is chosen, because the /huge competitor/'s spreadsheet does get along only with that.

The system gets along with it in the beginning: it offers the conversion with a "Textimport" dialogue, where it properly identified the "Unicode (UTF-16)" encoding and is able to resolve the columns and import the content. However, after OK the only cell populated is A1 and it contains asian characters.

Steps to Reproduce:
1. Set up an arbitrary CSV file 
2. Make sure, it is UTF-16BE encoded. I did it by converting a version in UTF-8 with the iconv tool: iconv -t UTF-16BE source.csv > target.csv
3. Open it with Calc
   You are offered text import. Columns and content are decoded properly
4. Confirm the text import.


Actual Results:
Only A1 contains anything. It is mostly asian characters.

Expected Results:
Cells are populated as previsted in the text import dialogue.


Reproducible: Always


User Profile Reset: No

Additional Info:
Looks as if the 0x00 bytes of the UTF16 encoding somewhere get lost between text import import and cell population.
Comment 1 Hubert Englmaier 2025-11-21 13:05:02 UTC
Created attachment 204164 [details]
Input file in UTF-16BE - cannot be imported
Comment 2 Hubert Englmaier 2025-11-21 13:06:17 UTC
Created attachment 204165 [details]
Same input file in UTF-8 - import works fine
Comment 3 Hubert Englmaier 2025-11-21 13:10:09 UTC
Correction: The encoding in UTF-16 is not being detected, it was rather my manual choice which had been taken over from a previous attempt.
Comment 4 Hubert Englmaier 2025-11-21 13:18:36 UTC
When adding the BOM FE FF at the very beginning, the import works properly.
Comment 5 Hubert Englmaier 2025-11-21 13:19:28 UTC
Created attachment 204166 [details]
Same input file in UTF-16BE, but with the BOM FE FF - import works fine
Comment 6 Hubert Englmaier 2025-11-21 13:21:49 UTC
Importance reduced, since it might be not a bug at all, if Calc assumes per default that UTF-16 should be little endian. Making it choosable would rather be a change request than a bug.
Comment 7 Regina Henschel 2025-11-21 20:24:33 UTC
This is a bug. The actually imported data should be the same as shown in the preview in the "Text Import" dialog.

The enhancement request to determine big-endian vs little-endian of UTF-16 on import, if the file has no byte-order-mark, should go into a new bug report.

The import is faulty as described in Version: 26.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 620(Build:0)
CPU threads: 32; OS: Windows 11 X86_64 (build 26100); UI render: Skia/Vulkan; VCL: win
Locale: de-DE (de_DE); UI: en-US
Calc: threaded