Bug 38836 - de-bloat internal ICU
Summary: de-bloat internal ICU
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Localization (show other bugs)
(earliest affected)
Hardware: Other All
: medium normal
Assignee: Not Assigned
QA Contact:
Keywords: difficultyBeginner, easyHack, skillScript, topicCleanup
Depends on:
Reported: 2011-06-30 09:21 CEST by Björn Michaelsen
Modified: 2015-12-16 00:19 CET (History)
4 users (show)

See Also:
Crash report or crash signature:


Note You need to log in before you can comment on or make changes to this bug.
Description Björn Michaelsen 2011-06-30 09:21:05 CEST
de-bloat internal ICU

Background: We re-use ICU internally, however - we use only a fraction of its functionality - yet we build and ship it all. These files are big - the icudata alone is 5.5Mb (compressed), and 13Mb on disk, and the redundant code chews a chunk of run-time memory usage. If we are building the internal ICU, we should disable everything we do not need. Unfortunately ICU has no way of doing this easily; so we need to do some manual work to the build to hack out pieces we do not need. First some API auditing is needed: eg. we do not use any ucnv_, ures_, unorm_, utrans_, u_shapeArabic, prefixed code at all - so none of that should be compiled in; we need to study of what ICU header are included (g grep 'include.*unicode'). More than that we need to kill some of the big data files eg. ~4Mb of charset conversion tables that are (apparently) unused - we already have charset conversion code in sal/ (based on ICU). To do that we most likely need to tweak the makefiles in icu/unxlngi6.pro/misc/build/icu/source/ - though this has to be done by updating the patch we apply in icu/ to the top-level pristine project. There are some links you can read on how to shrink the ICU data library here: [4]

Skills: gnu make, simple C, diff/patch
Comment 1 Michael Meeks 2011-12-07 09:03:46 CET
ICU is compiled and unpacked by some dmake magic; hopefully the makefile.mk shows how patches can be applied in there.

The code itself tends to be unpacked to eg. icu/unxlngi6.pro/misc/build/icu-*

and as you re-run 'build ; deliver' in the top-level it can be re-unpacked over that so take care ;-)
Comment 2 Michael Meeks 2011-12-08 13:55:08 CET
You can read more about customising ICU's data library to remove un-needed pieces here:


see eg.

"Reducing the Size of ICU's Data: Locale Data"
"Reducing the Size of ICU's Data: Conversion Tables"

etc. Hopefully there are some easy wins there from just reading the manual and creating some new patches to add to icu/makefile.mk to configure that lot out.
Comment 3 enrico.weigelt 2012-03-17 08:43:35 CET
Why having a bundled ICU at all ?
Comment 4 Björn Michaelsen 2012-04-04 03:44:27 CEST
Because it is not available by default on all platforms
CC'ing Michael, who was the original mentor for this IIRC?
Comment 5 Florian Reisinger 2012-05-18 08:59:02 CEST
Deteted "Easyhack" from summary
Comment 6 Björn Michaelsen 2012-07-05 14:31:21 CEST
@eike: is this still open? I vaguely remember you doing something in this area.
Comment 7 Eike Rathke 2012-07-06 02:09:04 CEST
There was someone working on it early this year or so, had some luck with stripping down a bit the data libraries, but never came up with the final patch (which would be just some makefile.mk hackery to pull a different tarball from ext_sources) nor a verification whether the stripped down data actually worked or not. Anyway, we'd have to redo things because in the mean time upgraded to ICU 49 and data packages have to be assembled individually for each version.
Comment 8 Björn Michaelsen 2013-03-28 16:13:12 CET
17:04 <@Sweetshark> erAck, mmeeks: is this still an easyhack: https://bugs.freedesktop.org/show_bug.cgi?id=38836 -- or should we better remove the whitespace keyword? from the last comment it seems not directly actionable to me ....
17:06 <@erAck> Sweetshark: close that, we can't remove anything anymore from ICU as external libs now depend on it.
Comment 9 Robinson Tryon (qubit) 2015-12-16 00:19:14 CET
Migrating Whiteboard tags to Keywords: (EasyHack DifficultyBeginner SkillScript TopicCleanup)