Extra hex bytes are being inserted into text files saved
from LibreOffice database queries. To show this, do the following:
1) Open up a simple database and run a query
2) Open a new text (.odt) document
3) Drag the query by the upper-left corner onto the text document
[ A window titled "Insert Database Columns" will open ]
4) Choose "Insert data as: text" on the top line
5) pick a database column or two, and then click OK
[ The data will be inserted into the text document ]
6) Save the document as ".txt", i.e., plain ascii text
7) View the document with the linux "less" command (or with
any program that will show the hex-byte content of the file)
8) Note that preceeding any of the ascii data from the database are
three extra bytes, "0xefbbbf", or "U+FEFF" as "less" shows them
These three extra bytes cause me grief when I use this general
scheme to create address labels. I didn't ask for them and they
don't belong at the beginning of the output file. It works this
way on all versions of LObase, up through 3.5.
Thanks for listening...
Further experimentation reveled that this problem is not related to "base" but shows up simply by saving a "writer" file as "plain text". So I am changing the component from base to writer. To show it, one only need start with a short ".odt" file and follow steps 6-8 in the original bug report.
Thanks for bugreport
Explanations of these 3 bytes is here:
Please, tell: which program has problem with it?
Thanks for the reference. I have read the Wikipedia article. It appears to relate entirely to Unicode encoding. In relation to UTF-8 it says, "The Unicode Standard does permit the BOM in UTF-8, but does not require or recommend its use." It further states, "the need for a BOM arises in the context of text interchange, rather than in normal text processing within a closed environment"
In any case, I don't want my data saved in UTF-8 for this particular application, but rather in plain ASCII. I tried setting the Tools/Options/Load save->HTML compatibility/Character set to Western Europe (ASCII/US), but the BOM is still there. I can appreciate the utility of the BOM for information interchange, but not for local work with Postscript programs and shell scripts. Perhaps the appropriate fix is to have an option in "load/save" that says, "I really want plain ASCII."
I wish I were knowledgeable enough to send you a patch, but the LibreOffice code is a bit formidable! Thanks for your interest and help.
> Perhaps the appropriate fix is to have an option in "load/save" that says, "I
> really want plain ASCII."
I agree with this. But currently we have very few developers. This may take several years. Sorry for such situation.
> but not for local work with Postscript programs and shell scripts.
But may be will more faster add to script removing this BOM and to ask Postscript programs authors to fix their programs
It's easy enough to stop the BOM being written but I presume we want to preserve it in existing documents.
Adding self to CC if not already on
Glad to see that this bug is still alive. I fixed my immediate problem with a simple "tr" command in my shell script, but I am still not happy with extraneous stuff being inserted in my text data. The easy fix would seem to be to have "Save Text as UTF-8" and "Save Text as ASCII" options available as a preference I can set. Thanks for your continued interest.
Created attachment 140162 [details]
Video demo of the patch.
I created a patch for review. With this patch if you do:
1) File --> Save As...
2) Choose Type = "Text (Choose Encoding)"
3) Click "Use Text - ..."
4) In the final dialog will be a checkbox "Include byte-order-mark". If you un-check this, then the BOM will not be included in the output.
Video demo attached.
Thanks to Martin van Zijl, this is fixed in 6.2.
Martin van Zijl committed a patch related to this issue.
It has been pushed to "master":
Fix tdf#44291. Allow saving text without byte-order mark.
It will be available in 6.2.0.
The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.