Download it now!
Bug 92161 - GBK encoded Chinese text not auto-detected
Summary: GBK encoded Chinese text not auto-detected
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
(earliest affected) release
Hardware: Other All
: low enhancement
Assignee: Not Assigned
Depends on:
Blocks: CJK-Chinese-Simplified
  Show dependency treegraph
Reported: 2015-06-18 16:00 UTC by ni shengyue
Modified: 2019-06-04 04:59 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:

GBK encoded file (90.45 KB, text/plain)
2015-06-18 16:00 UTC, ni shengyue
screen shot of Libreoffice and kate (328.36 KB, image/png)
2015-06-18 16:01 UTC, ni shengyue
1040044624.DOC - document de test (19.58 KB, application/msword)
2016-02-24 16:28 UTC, Stéphane Aulery
Testing document rendered winth LO under Windows 7 x86 (134.03 KB, image/png)
2016-02-24 16:29 UTC, Stéphane Aulery
Testing document rendered winth MS Word 2010 under Win7 x86 (198.17 KB, image/png)
2016-02-24 16:30 UTC, Stéphane Aulery

Note You need to log in before you can comment on or make changes to this bug.
Description ni shengyue 2015-06-18 16:00:37 UTC
Created attachment 116630 [details]
GBK encoded file

GBK encoded Chinese text  can't be read,while another application such as kate in KDE can decode it ok.
Comment 1 ni shengyue 2015-06-18 16:01:26 UTC
Created attachment 116631 [details]
screen shot of Libreoffice  and kate
Comment 2 Julien Nabet 2015-06-20 07:08:50 UTC
On pc Debian x86-64 with master sources updated yesterday, I could reproduce this.

I noticed this on console:
warn:legacy.osl:3197:1:sw/source/filter/ascii/parasc.cxx:265: Autodetect of text import without nag dialog must have failed
warn:vcl:3197:1:vcl/generic/fontmanager/fontconfig.cxx:863: In glyph fallback throwing away the language property of hi because the detected script for '0xc7e' is Telugu and that language doesn't make sense. Autodetecting instead.

Caolan: one for you? (vcl + language/font detection)
Comment 3 Caolán McNamara 2015-07-08 09:05:48 UTC
It can be read, it just can't auto-detect the format. You need to use file->open and select the "text - choose encoding" filter, the press ok, and then select "Chinese simplified (GB-18030)" as the encoding here.
Comment 4 ni shengyue 2015-07-09 15:55:04 UTC
Yes,I can read if using  Caolán McNamara 's method, but common user can't find this 'text - choose encoding' menu,we suggest Libre office should support auto-detect encoding mechanism,just as MS Office,so I suggest to REOPEN this case to track this requirement.
Comment 5 Caolán McNamara 2015-07-10 09:12:14 UTC
IMO non utf-8 text is just archaic at this point
Comment 6 Buovjaga 2015-10-10 12:12:46 UTC
Set to NEW, lowered priority and adjusted summary.
Comment 7 Stéphane Aulery 2016-02-24 16:28:35 UTC Comment hidden (obsolete)
Comment 8 Stéphane Aulery 2016-02-24 16:29:28 UTC Comment hidden (obsolete)
Comment 9 Stéphane Aulery 2016-02-24 16:30:00 UTC Comment hidden (obsolete)
Comment 10 Maxim Monastirsky 2016-05-16 13:18:48 UTC
Removing unrelated debian bug from 'See Also', and changing to 'enhancement', as charset auto-detection isn't implemented.
Comment 11 Cosimo Cecchi 2016-10-07 00:43:22 UTC
(In reply to Caolán McNamara from comment #5)
> IMO non utf-8 text is just archaic at this point

Caolán, unfortunately while this may be true for Europe and the US, it's definitely not true for China. GB18030 is the standard in China and a requirement for software that is distributed there.