Bug 74299 - IMPORT DXF: non-English text is garbled
Summary: IMPORT DXF: non-English text is garbled
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Draw (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Mike Kaganski
URL:
Whiteboard: target:5.0.0 target:5.3.0
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-01 02:17 UTC by Mike Kaganski
Modified: 2020-05-07 07:08 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Test files (421.87 KB, application/zip)
2014-02-01 02:17 UTC, Mike Kaganski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2014-02-01 02:17:57 UTC
Created attachment 93153 [details]
Test files

If a text in DFX contain non-English characters, these characters are imported as garbage.

1. If it is a DXF generated by AutoCAD version 2006 and older (has tag less than AC1021), then the text in it is non-Unicode, i.e. encoded in some codepage that was used on the creator's system (that is not specified in the DXF, this is a known problem of the pre-2007 format). Then the import filter should honour the LO default language setting (or probably let user to specify it explicitly), but it does not.

2. If its format version is 2007+ (tags AC1021 and greater), then the text in the DXF is UTF-8. But import filter doesn't treat them as such, and imports them as single-byte encoding.

3. If a text contain a special code (like \U+03B5) - they are listed, e.g., at
http://docs.autodesk.com/ACD/2013/ENU/index.html?url=files/GUID-7D8BB40F-5C4E-4AE5-BD75-9ED7112E5967.htm,topicNumber=d30e87614
http://docs.autodesk.com/ACD/2010/ENU/AutoCAD%202010%20User%20Documentation/index.html?url=WS1a9193826455f5ffa23ce210c4a30acaf-52eb.htm,topicNumber=d0e322626
- then importer should interpret them as appropriate, and substitute by corresponding character.

This problem makes the imported DXFs containing national text almost unusable.

Attached are DXF files containing Cyrillic and Greek characters, as well as a Degree sign. They were exported to DXF versions 2004 and 2007 using AutoCAD 2014 Russian.
Also, attached the PDF generated by AutoCAD (as a reference) and PDFs generated by LO 4.2.0.4 under Win7x64 using Russian locale (with problem spots marked by red ellipses).

This bug is already present in OOo 3.3.0. Still reproducible with LO 4.2.0.4 under Win7x64 and 4.1.4.2 under Ubuntu 13.10 x64 (Linux version additionally distorts text sizes).
Comment 1 tommy27 2014-07-31 13:58:21 UTC
I confirm bug under Win7x64 using LibO 4.2.5.2
haven't tried 4.3.0 or 4.4.x master yet
Comment 2 Mike Kaganski 2015-05-04 15:17:27 UTC
Submitted patch to gerrit: https://gerrit.libreoffice.org/15627
Comment 3 Commit Notification 2015-05-07 14:16:43 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=28a2f0d6d803569952e7b3efb0269001af8e9c7e

tdf#74299: improve DXF import

It will be available in 5.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 4 Commit Notification 2016-09-01 13:10:05 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1a9a77f84cac68bd5374df3e9ee4df88dc87a0ac

Related: tdf#74299: use OEM encoding for ancient DXF

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.