Bug 50916 - Allow more than 1024 columns in calc
Summary: Allow more than 1024 columns in calc
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: high enhancement
Assignee: Not Assigned
QA Contact:
URL:
Whiteboard: target:5.2.0
Keywords:
: 78255 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-06-09 09:30 UTC by Gerry
Modified: 2016-08-04 04:55 UTC (History)
25 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gerry 2012-06-09 09:30:46 UTC
Currently Calc is limited to 1024 columns.

This limitation causes problems for certain use cases:
* import of .xlsx files with more than 1024 columns (Excel 2007 allows 16384 columns)
* generally calculations with big matrices/tables

Please enable Calc to handle up to 16384 columns.
Comment 1 Robinson Tryon (qubit) 2012-07-15 13:24:56 UTC
See discussion on bug 43911 for more information.

The tl;dr is: "increasing the column limit will increase the the memory needed for every sheet extremelly" unless we "change the column container to a dynamic container," a change that "might take much more time" than a month.
Comment 2 Steffen Moeller 2012-11-02 09:32:35 UTC
I had a colleague of mine send her data twice, in different formats, assuming a file corruption. This was because LibreOffice Calc does not even warn when it encounters a file with more columns than it can deal with. This is a bit embarassing, not?

For your internal priorisation, please be aware that today, with many research lab are equipped with machines to perform high throughput analyses, files with >1500 columns (which typically host attributes in statistics) are no longer exceptional. Excel once was sucking badly with 256 columns, but they got that one fixed for a reason.

Together with this functionality of thousands of columns there will be new interfaces developing alongside for the filtering of such. Time will bring it. Just now, please fix this behaviour. I offer 100 Euros from my very private pocket for this feature. The missing notice of losing data I tend to see as a bug.

Concerning the "dynamic container" comment below I suggest to think about sparse formats. There is tons of sparse data out there in scientific data and to have a regular desktop tool prepared to handle that efficiently would be exciting.
Comment 3 Julien Nabet 2013-02-08 22:51:33 UTC
Kohei/Markus/Eike: I suppose there must be some reasons we still got this limit but could you tell/remind them? I think I must have read about this on IRC or dev mailing list but don't remember.
Comment 4 Robinson Tryon (qubit) 2013-02-08 23:26:55 UTC
(In reply to comment #3)
> Kohei/Markus/Eike: I suppose there must be some reasons we still got this
> limit but could you tell/remind them? I think I must have read about this on
> IRC or dev mailing list but don't remember.

Markus' comments on bug 43911 provide more details:

---
Markus Mohrhard 2012-01-05 14:21:03 UTC

No. We won't increase the column limit soon. First we would need to rework all internal cell datastructures like I did for sheets in 3.5. We still use fixed size containers for columns. That means that increasing the column limit will increase the the memory needed for every sheet extremelly. The column limit of 1024 is a design decision by us.

If you are really in need for LibO with a bigger column limit you can easily build one yourself and I tell you the line you need to change to increase the limit.
---

---
Markus Mohrhard 2012-01-05 15:30:04 UTC 
> So, our hopes is to convince developers to increase the "Max Column" limit, or
> that they switch to a dynamic size column container approach in order to use
> necessary memory.

This is not an easy switch. This is more like half a heart transplant and therefore is nothing that will be done just from today to tomorrow. I needed around one month for the change for the sheet change and I think that changing the column container might take much more time.

> Is it the final decision that the distributed official LibO version will have a
> "Max Column" limit of 1024 in the near future releases? And people like me will
> have to modify the "Max Column" limit and recompile/recreate a special LibO
> version for our needs?

It is not a final decision but I doubt that we will increase the column limit if we did not change the column container to a dynamic container. In my dev builds a empty sheet already needs around 150 KB. For the change we need someone who has enough time to work on it.

Feel free to step in here and help to improve the situation. You'll get as much help as possible by us and we will assist as much as we can.
---
Comment 5 Phil 2013-03-11 19:01:01 UTC
@Steffen Moeller
@Gerry

Consider http://www.r-project.org/
Comment 6 Gerry 2013-03-11 19:23:11 UTC
@Phil: Thanks for the nice suggestion. However, spreadsheets, databases and statistical software like R, Stata, ... have very different goals. Having said this, there is a clear need and usage for spreadsheets handling more than 1024 columns. People use it and I use it, too. 

Given the dynamic idea of cells in spreadsheets (dependencies on other cells, formulas in cells, fill-out features), they are very good at data processing and data generation. These are features which are not as good to handle in statistical software. Data analyis is, of course.

Independently from this, when receiving .xlsx files with more than 1024 columns, LibreOffice cannot be used. Gnumeric can serve as an alternative then.
Comment 7 hemmelig 2013-11-17 09:38:30 UTC
The question is then how to kick start/fundraise money to make the column container to a dynamic container? What will be needed in funds? 

The 1024 column limit is the simplest way to shoot down any suggestion for using Libreoffice in any organization. It MUST be fixed, and all I can offer is €.
Comment 8 retired 2013-11-17 12:52:22 UTC
The last column showing for me is column "AMJ". I don't see numbers so not sure what that is, guess you can figure that out by creating a crazy formula with the number of alphabet letters. Maybe the result is 1024 - not sure.

If that is not sufficient for your usage and would like to see this bug fixed and want to put in cash for that specific enhancement please create an account on http://freedomsponsors.org and setup a bounty. I'm aware of several LO bugs that got fixed that way.
Comment 9 Kalin KOZHUHAROV 2014-03-03 04:20:16 UTC
I recently hit that bug, i.e. RFE and came here...

I usually avoid using Calc for large datasets (CSV/Perl/gnuplot), but sometimes I use/make tools that output in columns instead of rows (e.g. few spectra in 4K columns). Since all machines around here have at least 16GB RAM, I don't think increased memory footprint is a problem for me. I also use Gentoo, so recompiling LibreOffice is the norm.

Has anyone identified what needs to be patched to increase max cols?
---
Markus Mohrhard 2012-01-05 14:21:03 UTC

If you are really in need for LibO with a bigger column limit you can easily build one yourself and I tell you the line you need to change to increase the limit.
---

I will try if I have some better pointer at where to look in the source (file/line, etc.) since I don't want to dive headfirst in the huge codebase.

Once a patch is available, distros (Gentoo at least) can easily choose to decide whether it cares for power (ab)users or not.
Comment 10 Gerry 2014-04-20 09:19:20 UTC
How likely will it be that the limitation of Calc to 1024 columns will be increased in the next LO release(s), 4.3 or 4.4? I have seen that lots of performance-related parts of Calc have been rewritten. Are these changes also related to this enhancement request?
Comment 11 Markus Mohrhard 2014-04-20 14:20:55 UTC
(In reply to comment #10)
> How likely will it be that the limitation of Calc to 1024 columns will be
> increased in the next LO release(s), 4.3 or 4.4? I have seen that lots of
> performance-related parts of Calc have been rewritten. Are these changes
> also related to this enhancement request?

No. Unless someone steps up and starts working on it the chances are not too big that it will be implemented for the next releases. This bug requires another not so small refactoring of calc core and therefore requires someone to invest the time in working on it.

If you want to see this enhancement request implemented the best way forward is to start working on it. You will get any help you need from the community but right now I think there is no Calc developer who has the time to implement it.
Comment 12 bfoman (inactive) 2014-06-04 22:49:51 UTC
*** Bug 78255 has been marked as a duplicate of this bug. ***
Comment 13 Miguel 2014-09-27 14:22:16 UTC
I suggest an importance increase.

I have also suggest adding a new tag - msoffice or compatibility. The goal of this tags is to group all bugs that, in my view, critically affect Libreoffice - compatibility with MS Office files.

I run a company that has, from day 1, used Libreoffice (and Openoffice before) exclusively. But as much as we care for this project and what it means, it is vital that the community understands that a lack of compatibility with what the majority of users (and companies) use it a hindrance in the adoption of Libreoffice. And a headache to those who use it but have to interact with people who use MS Office.
Comment 14 Gerry 2014-09-27 14:28:09 UTC
@Miguel: Please see the comments to Kohei's blog: http://kohei.us/2014/09/09/slides-for-my-talk-at-libreoffice-conference-in-bern/

I quote Kohei: "The only thing that’s holding us back is nothing technical i.e. we have a pretty good idea how we want to get it done. It’s just a matter of getting the effort funded."

I also wished to use >1024 columns in Calc (even 2048 would already help), but I understand the problem of funding this improvement, as it requires some deeper changes in Calc.
Comment 15 Miguel 2014-09-27 14:38:20 UTC
@Gerry - something like https://www.bountysource.com/ ?

Perhaps it would be a good idea to add a link there then.

I remember a (still open) Evolution Mail bug where, within the comments alone, people were pledging to add the ability to change email fonts used.


If the technical effort has been calculated, then the hard part is done and a funding process can be added?
Comment 16 Gerry 2014-09-27 14:43:26 UTC
@Miguel: I think for this, you need to ask the Calc developers directly: Kohei, Eike or Markus, for example. I think they are CCed in this bug.
Comment 17 Alex Thurgood 2015-01-03 17:39:13 UTC Comment hidden (no-value)
Comment 18 -- removed -- 2015-02-19 10:08:31 UTC
Just hit the limit myself and I'd like to see this implemented soon. How can I contribute to bounties, fundraising, etc.? Is there anything set up yet?
Comment 19 Eike Rathke 2015-02-19 11:09:55 UTC
Throwing money at it alone won't help. Dedicated developer resources to come up with a viable concept to change column allocation to dynamic containers that at the same time would not make sheet operations suffer from performance bottlenecks AND IMPLEMENT it would help.
Comment 20 Rob 2015-03-22 07:39:53 UTC
I used freedomsponsors.org for a small bugfix in Thunderbird recently and had a very quick result, I had added a realistic reward.  This feature sounds like a biggie!   Another I have seen is www.bountysource.com.  I haven't got clue if doing these is considered good, bad, or annoying but it does target what you want.
Comment 21 mszeliga 2015-04-15 05:30:14 UTC Comment hidden (me-too)
Comment 22 Gerry 2015-04-20 08:15:55 UTC
If 16384 columns in Calc require quite some refactoring, would this be also the case if the number of columns would be increased to 2048? 

I think 2048 columns would already solve many problems, but - of course - in the long run >= 16384 are required for Excel compatability.

Thanks!
Comment 23 pier andre 2015-04-24 07:50:39 UTC
in this case yes, I exported a database, about 2kx2k in csv file to be eaten by a python program and I can open it with excel but I cannot with libreoffice, and this is bad..    :-)  :-) 
please enable calc to handle up to 16364 columns..
Comment 24 Dennis Francis 2015-04-27 21:08:50 UTC
Hi devs,

If no one is yet working on this, I would like to work on this. My motivation is to learn the Calc code-base in depth from the experts. Right now I will be able to spend one full day per week, but may increase depending on my work priorities. 
I am aware that this change will take quite some time at this rate, but I still want to work on it.
Comment 25 Robinson Tryon (qubit) 2015-04-29 22:05:41 UTC
(In reply to Dennis Francis (ldcs.co.in) from comment #24)
> If no one is yet working on this, I would like to work on this. My
> motivation is to learn the Calc code-base in depth from the experts.

Hi Dennis,
It's great to hear that you're going to join our merry band of LibreOffice developers! Please start by building LibreOffice from source and completing your first Easy Hack:
https://wiki.documentfoundation.org/Development/Easy_Hacks

Good luck!
--R
Comment 26 Markus Mohrhard 2015-04-29 22:20:11 UTC
(In reply to Dennis Francis (ldcs.co.in) from comment #24)
> Hi devs,
> 
> If no one is yet working on this, I would like to work on this. My
> motivation is to learn the Calc code-base in depth from the experts. Right
> now I will be able to spend one full day per week, but may increase
> depending on my work priorities. 
> I am aware that this change will take quite some time at this rate, but I
> still want to work on it.

Hey,

great to see you looking into that.

So the approach is to first understand the problem with increasing the number of columns in the current design.
For that you should read a bit in the table*.[ch]xx files and check where we use the maximum column number.

ONe of the biggest problems is replacing the static array of the number of columns with something dynamic. Finding a good data structure for that will be crucial for finding a design that does not suck performance wise.

If you need more information feel free to ask here or on the dev IRC channel. Kohei (kohei), Eike (erack) and I (moggi) -- (IRC nicks) -- can answer your questions if you need some input.
Comment 27 Dennis Francis 2015-04-30 03:07:20 UTC
From table.hxx, and column.[hc]xx, it looks like each column is set as a fixed sized mdds mtv of MAXROWCOUNT.

http://opengrok.libreoffice.org/xref/core/sc/source/core/data/column.cxx#83

hence every single column in the table costs us memory space proportional to MAXROWCOUNT irrespective of the number of non empty data cells in that column.

Can't we set the initial number of elements of mtv's like maCells, maBroadcasters etc.. to a smaller value say INITROWCOUNT=100 in the ScColumn constructor and call mtv resize() only when we get a request to set a new data at a logical location (say 120) which is beyond the current capacity of the mtv (100) ?

If calling mtv resize(old_size + additional_space_needed) for every new data addition is suboptimal, can we do resize(old_size + k*additional_space_needed) where k is some heuristic measure of "expected growth" in space for that column from the saved history of previous calls to resize() ?
Comment 28 Markus Mohrhard 2015-04-30 03:10:12 UTC
(In reply to Dennis Francis (ldcs.co.in) from comment #27)
> From table.hxx, and column.[hc]xx, it looks like each column is set as a
> fixed sized mdds mtv of MAXROWCOUNT.
> 
> http://opengrok.libreoffice.org/xref/core/sc/source/core/data/column.cxx#83
> 
> hence every single column in the table costs us memory space proportional to
> MAXROWCOUNT irrespective of the number of non empty data cells in that
> column.
> 
> Can't we set the initial number of elements of mtv's like maCells,
> maBroadcasters etc.. to a smaller value say INITROWCOUNT=100 in the ScColumn
> constructor and call mtv resize() only when we get a request to set a new
> data at a logical location (say 120) which is beyond the current capacity of
> the mtv (100) ?
> 
> If calling mtv resize(old_size + additional_space_needed) for every new data
> addition is suboptimal, can we do resize(old_size +
> k*additional_space_needed) where k is some heuristic measure of "expected
> growth" in space for that column from the saved history of previous calls to
> resize() ?

mdds is not the problem as mdds internally does not more or less what you mentioned. Actually increasing the row limit is not a problem.

The problem is all the places where we iterate through each column.
Comment 29 Jouni Järvinen 2015-08-03 18:31:34 UTC
(In reply to Markus Mohrhard (retired) from comment #28)
> The problem is all the places where we iterate through each column.

Start from something. If it has to be a §for§ loop, then make it a §for§ loop. Improve when you get better ideas. Something as slow as §for§ is better than nothing at all. Someone like you should know it.
Comment 30 Gerry 2015-08-03 18:45:57 UTC
(In reply to Jouni Järvinen from comment #29)
> Start from something. If it has to be a §for§ loop, then make it a §for§
> loop. Improve when you get better ideas. Something as slow as §for§ is
> better than nothing at all. Someone like you should know it.

The developer team has a clear understanding of the problems and potential solutions for the 1024 columns limit. So, I don't understand your point. The problem is that the solution entails a refactoring of (old, pre-LibreOffice) code. Lots of work.

@Dennis Francis: Did you have time to look into the code related to the refactoring towards >1024 columns?
Comment 31 ERIC 2015-09-13 15:04:39 UTC
excel files import have >1700 columns
Comment 32 Eike Rathke 2015-09-14 11:40:25 UTC
We know. And please don't mess around with the Version field.
Comment 33 Yogesh Desai 2016-01-18 13:01:55 UTC
I would love to work on this Bug too.
Comment 34 Dennis Francis 2016-01-20 08:10:23 UTC
Hi All

Me and my colleagues(Arul, Yogesh, Kumar and Sahas) are working on this bug
as per the discussion we had with LO devs at http://nabble.documentfoundation.org/tdf-50916-Calc-Dynamic-column-container-td4162663.html

The first of the many incremental patches to come is at https://gerrit.libreoffice.org/21620

--
Thanks.

www.ldcs.co.in
team@ldcs.co.in
Comment 35 Commit Notification 2016-02-04 19:41:19 UTC
Dennis Francis committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f14d271d31e75de09821cf1766c7ab2a9c6e0461

Patch#1 : Dynamic column container in the pursuit of tdf#50916

It will be available in 5.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 36 Commit Notification 2016-02-10 23:10:10 UTC
Dennis Francis committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=bc20c6d0f397c0c1aef6ef7d6f750c2f81af8db6

Refactor ScMarkData for tdf#50916

It will be available in 5.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 37 Commit Notification 2016-03-19 15:16:30 UTC
Dennis Francis committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=187669de1096ebbf4347ec1126eb6b693f6c5a96

tdf#50916 : Unit tests for the refactored ScMarkData...

It will be available in 5.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 38 Dennis Francis 2016-08-04 04:55:33 UTC
Submitted a patch for refactoring ScAttrArray at https://gerrit.libreoffice.org/27828