Bug 160289 - Converting ANSI encoded CSV file to PDF via command line results in replacing special characters (trademark:™,®) with the special question mark symbol:�
Summary: Converting ANSI encoded CSV file to PDF via command line results in replacing...
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
7.5.0.1 rc
Hardware: All Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-03-20 18:06 UTC by Yaroslav Moiko
Modified: 2024-04-03 10:59 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
CSV File in ANSI encoding that contains a special characters. Use it as a source (736 bytes, text/csv)
2024-03-20 18:08 UTC, Yaroslav Moiko
Details
Result PDF file that shows the problem with replaced special symbols (37.90 KB, application/pdf)
2024-03-20 18:08 UTC, Yaroslav Moiko
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yaroslav Moiko 2024-03-20 18:06:22 UTC
Description:
When attempting to convert an ANSI-encoded CSV file to PDF via the command line using LibreOffice, special characters such as ™ and ® are replaced with question marks � in the resulting PDF file. The command used for conversion is as follows:
'soffice.exe --headless --convert-to pdf "D:\MyCsvFile.csv" --outdir "D:\conversionResults"'
This issue does not occur when opening the same file using the LibreOffice GUI and subsequently exporting it to PDF via Calc. Furthermore, saving the CSV file in UTF-8 encoding using Notepad++ allows for successful conversion via the command line, preserving all original content.

To work around the issue, specifying the ANSI encoding with the flag --infilter="CSV:44,34,ANSI" in the command line enables successful conversion of ANSI-encoded files to PDF.

From these observations, it appears that there may be a problem with identifying the encoding of the source file when it is loaded for conversion using the command line interface.

Additionally, it's worth noting that this issue is reproducible in all stable releases following version 7.4.7.2.

Steps to Reproduce:
1. Create a .csv file that contains special characters such as ™ and ® and save it using ANSI encoding. Or take the .csv file from the attachments.
2. Install any of the affected versions of LibreOffice (any starting from 7.5.0.1 to 7.6.6).
3. Convert the csv file to PDF using the following command line, replacing paths as necessary: 'soffice.exe --headless --convert-to pdf "D:\MyCsvFile.csv" --outdir "D:\conversionResults"'.
4. Inspect the result PDF document.

Actual Results:
Special characters such as ™ and ® from the source file are replaced with question marks � in the result PDF file.

Expected Results:
All the content from the original file is preserved in the resulting PDF document without any unwanted replacements. All special characters should be kept.


Reproducible: Always


User Profile Reset: No

Additional Info:
Version: 7.6.5.2 (X86_64) / LibreOffice Community
Build ID: 38d5f62f85355c192ef5f1dd47c5c0c0c6d6598b
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded
Comment 1 Yaroslav Moiko 2024-03-20 18:08:11 UTC
Created attachment 193215 [details]
CSV File in ANSI encoding that contains a special characters. Use it as a source
Comment 2 Yaroslav Moiko 2024-03-20 18:08:57 UTC
Created attachment 193216 [details]
Result PDF file that shows the problem with replaced special symbols
Comment 3 m_a_riosv 2024-03-21 01:18:32 UTC
Maybe in relation with
https://bugs.documentfoundation.org/show_bug.cgi?id=150714
default encoded is UTF-8-encoded
So is needed to put the encoded if it is different. CSV files are plain text with no encoded definition.

>>To work around the issue, specifying the ANSI encoding with the flag --infilter="CSV:44,34,ANSI" in the command line enables successful conversion of ANSI-encoded files to PDF.
It is not a workaround, it is part of command line options.

I think, not a bug.
Comment 4 Yaroslav Moiko 2024-03-22 09:28:06 UTC
(In reply to m_a_riosv from comment #3)
> Maybe in relation with
> https://bugs.documentfoundation.org/show_bug.cgi?id=150714
> default encoded is UTF-8-encoded
> So is needed to put the encoded if it is different. CSV files are plain text
> with no encoded definition.
> 
> >>To work around the issue, specifying the ANSI encoding with the flag --infilter="CSV:44,34,ANSI" in the command line enables successful conversion of ANSI-encoded files to PDF.
> It is not a workaround, it is part of command line options.
> 
> I think, not a bug.

Thanks for your reply. Why does it work ok, when I load my .csv file via GUI then?
I thought some logic automatically determines the encoding before loading the content (which could be broken).

Maybe it's a deluxe request, but I think it would be extremely useful if converting from csv to pdf via the command line could have identified encoding automatically (similarly to what is done when opening via UI).
Comment 5 Werner Tietz 2024-03-25 08:12:46 UTC
@https://bugs.documentfoundation.org/show_bug.cgi?id=160289#c4

there is no logic on Import via GUI, except the __Dialog__ »remembers« the last settings taken by $USER. (and $User has the Option to change something! )

IMHO it would be a bad idea to apply implicitly such a rule to Commandline-conversions

my vote: RESOLVED ⇒ NOT A BUG