Bug 56131 - Download only the needed 3rd party source tarballs
Summary: Download only the needed 3rd party source tarballs
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: framework (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: Other All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: difficultyBeginner, easyHack, skillScript, topicCleanup
Depends on:
Blocks:
 
Reported: 2012-10-18 11:10 UTC by Petr Mladek
Modified: 2015-12-16 00:19 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Petr Mladek 2012-10-18 11:10:12 UTC
We currently download almost all 3rd party source tarballs even when they are not used.

One solution is to define tarball names as configure variables and define them only when needed. This approach is already used for some extensions.

Another solution would be to improve the "download" script and support some simple syntax in the "ooo.lst. module, e.g:

  7376930b0d3f3d77a685d94c4a3acda8-STLport-4.5-0119.tar.gz if @BUILD_STLPORT@ == "yes"


I prefer the second solution because it is handy to have list of all 3rd party modules in one place "ooo.lst.in" file. In addition, it is easier to update the version in this short file. Finally, we should touch configure.in only when we want to change the logic of the checks. It helps with cherry-picking fixes for different LO releases.
Comment 1 David Tardon 2012-10-19 07:28:08 UTC
(In reply to comment #0)
> We currently download almost all 3rd party source tarballs even when they
> are not used.
> 
> One solution is to define tarball names as configure variables and define
> them only when needed. This approach is already used for some extensions.

It is also used for all gbuildified external modules.

$ grep @ ooo.lst.in | wc -l
68

> 
> Another solution would be to improve the "download" script and support some
> simple syntax in the "ooo.lst. module, e.g:
> 
>   7376930b0d3f3d77a685d94c4a3acda8-STLport-4.5-0119.tar.gz if
> @BUILD_STLPORT@ == "yes"

I do not think this is a good idea.

> 
> 
> I prefer the second solution because it is handy to have list of all 3rd
> party modules in one place "ooo.lst.in" file. In addition, it is easier to
> update the version in this short file. Finally, we should touch configure.in
> only when we want to change the logic of the checks. It helps with
> cherry-picking fixes for different LO releases.

I agree that the "a variable set in configure.in" approach is not ideal, but at least it means the tarball name only needs to be changed in one place. In the previous state of affairs the name was hardcoded in ooo.lst.in _and_ in the module's makefile.mk (split into two parts--the upstream tarball name and the md5 hash).

There is a third possible solution: to have all the download information directly in the external modules. My preferred format for that is 2 files for every tarball in a module:
* <tarball>.md5 for the name and md5 (in fact, output of md5sum <tarball>)
* <tarball>.url for source URL. This file should be optional for the default URL (http://dev-www.libreoffice.org/src).

This would need a change to the tooling, of course, but I think it would be easier to use in the end.
Comment 2 Matúš Kukan 2012-11-21 23:09:19 UTC
(In reply to comment #1)
> There is a third possible solution: to have all the download information
> directly in the external modules. My preferred format for that is 2 files
> for every tarball in a module:
> * <tarball>.md5 for the name and md5 (in fact, output of md5sum <tarball>)
> * <tarball>.url for source URL. This file should be optional for the default
> URL (http://dev-www.libreoffice.org/src).

So, 'download' ('make fetch') would first do ls */*.md5 and then wget them ?
And also gb_UnpackedTarball_UnpackedTarball,foo would read the tarball-name from foo.md5 file ?
Sounds cool.

But how would 'make fetch' know which tarballs do we really need ?
We would need to put that information into <tarball>.md5 I guess. (Or hint how to find it out.)
Another way could be:
for foo.md5 search for 'foo' in new 'TARBALLS' variable. (Something like BUILD_TYPE)

Does this make sense ?
Am I missing something?
Comment 3 Petr Mladek 2012-11-22 10:33:32 UTC
I see that Matúš has already made nice improvements. Thanks for it.

I think that should not overengineer it :-) If we have <tarball>.md5, we could omit md5 from the tarball name. So<tarball> will be the really used tarball name and it will be the same as in the upstream project.

Heh, I have newer understood why they added the md5 sum into the tarball names in the first place. IMHO, it was pretty non-standard solution and was not a big win.


My view is that it might work the following way:

+ configuire.ac will has something like:

   BLABLA_TARBALL=
   if <we want a 3rd party tarball because of a feature> ; then
       BLABLA_TARBALL="blabla-<version>.tar.gz"
   fi
   AC_SUBST(BLABLA_TARBALL)


+ ooo.lst.in would include:
    @BLABLA_TARBALL@

+ download would read ooo.lst and download only the defined tarballs
     get $BLABLA_TARBALL from a given URL (see below my idea about the URL)
     get $BLABLA_TARBALL.md5 from the same URL
     check md5 to make sure that the tarball is valid

+ finally, $BLABLA_TARBALL can be used also in the makefile that is responsible for building the 
   3rd-party module

This way, we could use the original tarballs and need to define tarball names only once in configure.ac as suggested by David.


Regarding the URL. I do not see much advantages of $BLABLA_TARBALL.url. We would need to have this file on the default location http://dev-www.libreoffice.org/src. But I think that the main reason to use another location is because people do not have access to the default location. For example, KAMI has his extesnions on the OxygenOffice site. If people are able to upload $BLABLA_TARBAL.url on the default location, they could put there the tarball as well. I think that we should have all tarballs under our control on the default location anyway.

So, we either need to move all tarballs to the default location or keep the URLs in ooo.lst.

BTW: It would be nice to rename the file ooo.lst  to download.lst or so.
Comment 4 Matúš Kukan 2012-11-22 18:43:59 UTC
(In reply to comment #3)
> My view is that it might work the following way:
> 
> + configuire.ac will has something like:
> 
>    BLABLA_TARBALL=
>    if <we want a 3rd party tarball because of a feature> ; then
>        BLABLA_TARBALL="blabla-<version>.tar.gz"
>    fi
>    AC_SUBST(BLABLA_TARBALL)
> 
> 
> + ooo.lst.in would include:
>     @BLABLA_TARBALL@

So, this is current state, but there is demand for --with-all-tarballs
See also https://gerrit.libreoffice.org/#/c/1088/
It would be more complicated.

To be more flexible, I've moved some parts of download to configure and Makefile.fetch.
And also ooo.lst ~> download.lst where the names are defined.
The result is in feature/download branch.
I am not sure what others will think about it, let's see.
Quite possibly it's not perfect.

> + download would read ooo.lst and download only the defined tarballs
>      get $BLABLA_TARBALL from a given URL (see below my idea about the URL)
>      get $BLABLA_TARBALL.md5 from the same URL
>      check md5 to make sure that the tarball is valid
>

If anyone thinks it's going to be better this way, I am willing to hack also on this but I think first we need $BLABLA_TARBALL.md5 files uploaded in http://dev-www.libreoffice.org/src/

> This way, we could use the original tarballs and need to define tarball
> names only once in configure.ac as suggested by David.

yep, modulo the names are in download.lst in feature/download
Comment 5 Matúš Kukan 2012-11-23 12:06:49 UTC
(In reply to comment #4)
> > + download would read ooo.lst and download only the defined tarballs
> >      get $BLABLA_TARBALL from a given URL (see below my idea about the URL)
> >      get $BLABLA_TARBALL.md5 from the same URL
> >      check md5 to make sure that the tarball is valid
> >
> 
> If anyone thinks it's going to be better this way, I am willing to hack also
> on this but I think first we need $BLABLA_TARBALL.md5 files uploaded in
> http://dev-www.libreoffice.org/src/

Ah, I am probably talking nonsense.
Maybe you had in mind something like
http://cgit.freedesktop.org/libreoffice/core/commit/?h=feature/download&id=cb15193e9fe67e4055dcb160a708c2de40f57a2f

cdr, mspub and visio are downloaded 'directly'.
But it's not possible for more, because they are in there only in the form <md5sum>-<name>.<suffix> AFAICS

Anyway, am going to send mail to the list to get feedback.
Comment 6 Petr Mladek 2012-11-23 16:36:14 UTC
I basically like what I see in http://cgit.freedesktop.org/libreoffice/core/commit/?h=feature/download&id=cb15193e9fe67e4055dcb160a708c2de40f57a2f

It hardcodes tarball name only on a single place, allows to download all tarballs or only the really needed, ...

I am not 100% sure if we want to remove md5 sum from the tarball names. IMHO, it makes sense but it would need some work for distro package maintainers. Also I am not sure why they were instroduced in the first place this way. I would like to hear from David, Rene, and Tomas before we rename them. Maybe, we could discuss in on ESC meeting the following week.
Comment 7 David Tardon 2012-12-19 08:33:34 UTC
this has been done
Comment 8 Robinson Tryon (qubit) 2015-12-16 00:19:39 UTC
Migrating Whiteboard tags to Keywords: (EasyHack DifficultyBeginner SkillScript TopicCleanup)
[NinjaEdit]