Created attachment 116630 [details]
GBK encoded file
GBK encoded Chinese text can't be read,while another application such as kate in KDE can decode it ok.
Created attachment 116631 [details]
screen shot of Libreoffice and kate
On pc Debian x86-64 with master sources updated yesterday, I could reproduce this.
I noticed this on console:
warn:legacy.osl:3197:1:sw/source/filter/ascii/parasc.cxx:265: Autodetect of text import without nag dialog must have failed
warn:vcl:3197:1:vcl/generic/fontmanager/fontconfig.cxx:863: In glyph fallback throwing away the language property of hi because the detected script for '0xc7e' is Telugu and that language doesn't make sense. Autodetecting instead.
Caolan: one for you? (vcl + language/font detection)
It can be read, it just can't auto-detect the format. You need to use file->open and select the "text - choose encoding" filter, the press ok, and then select "Chinese simplified (GB-18030)" as the encoding here.
Yes,I can read if using Caolán McNamara 's method, but common user can't find this 'text - choose encoding' menu,we suggest Libre office should support auto-detect encoding mechanism,just as MS Office,so I suggest to REOPEN this case to track this requirement.
IMO non utf-8 text is just archaic at this point
Set to NEW, lowered priority and adjusted summary.
Created attachment 122949 [details]
1040044624.DOC - document de test
Created attachment 122950 [details]
Testing document rendered winth LO 126.96.36.199 under Windows 7 x86
Created attachment 122951 [details]
Testing document rendered winth MS Word 2010 under Win7 x86
Removing unrelated debian bug from 'See Also', and changing to 'enhancement', as charset auto-detection isn't implemented.
(In reply to Caolán McNamara from comment #5)
> IMO non utf-8 text is just archaic at this point
Caolán, unfortunately while this may be true for Europe and the US, it's definitely not true for China. GB18030 is the standard in China and a requirement for software that is distributed there.
The information shown on this page is very important and useful for clients to manage their schedules for transactions. Banks and other financial insitutions are playing an essential role in business http://www.essaysoriginreview.com/review-on-college-paper-org/ is a site where you can get all the necessary details about this.
The information shown on this page is very important and useful for clients to manage their schedules for transactions. Banks and other financial insitutions are playing an essential role in business https://www.essaysoriginreview.com/review-on-college-paper-org/ is a site where you can get all the necessary details about this.
Learn to clearly formulate, justify, defend your own point of view. What is gained at the cost of one's own mental efforts and work and relying on https://order-essay.org/dbq-essay-help. The most effective way to consolidate and improve one's own achievements is best remembered and assimilated. And learn from those who have already succeeded in what interests you.
Note that after bug 60145 is fixed, it's actually easy to add auto-detection of any encoding recognized by ICU's charset detector, amending SwIoSystem::IsDetectableText.
Setting this as easyhack.
Have created https://gerrit.libreoffice.org/c/core/+/127347 to fix this. Though now I'm wondering whether we could modify that code to support all of the encodings in LO?
(In reply to Daniel Thomas from comment #16)
> Have created https://gerrit.libreoffice.org/c/core/+/127347 to fix this.
Thanks - merged! :)
> Though now I'm wondering whether we could modify that code to support all of
> the encodings in LO?
It should be relatively easy. We already have rtl_getTextEncodingFromMimeCharset, which seems to be what ucsdet_getName returns. The only concern here would be false detections, and we could use ucsdet_getConfidence  to filter out unreliable detections.
Feel free to submit a new enhancement, and then fix it - that would be a nice hack!
dtm committed a patch related to this issue.
It has been pushed to "master":
tdf#92161 add GB18030 encoding to iodetect
It will be available in 7.4.0.
The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.
https://bugs.documentfoundation.org/show_bug.cgi?id=146429 created for the aforementioned addition
I confirm the original bug behaviour is now fixed on 7.4 and trunk. The commit 763c2a436baa1814d2bf95477b9d79fa3934d5e5 added GB18030 which can still decode most of the characters encoded as GBK.
I think we should leave this open for now in case someone is interested he/she can still work on this for improvements (i.e. add the detection of other encodings).