This is roughly similar to bug 68903, just another variant.
The document in question has an .xls suffix, but appears to actually contain HTML content (root element is a <div> and that contains a <table>) rather than Excel binary data. I found other xls files with proper excel data in them which open correctly.
However, I've been downloading these xls files from this same site (it's the data export from an event registration site) for a long time, and in LibreOffice 4.0.x it would open them in Calc. I had a bunch of the older files sitting around, and they have always been in this format (I just upgraded to 4.1.x from 4.0.x).
In this case, selecting one of the Excel filetypes from the selector results in an error trying to open the file, but if I create a new Calc document, then do File > Open from there, it correctly opens it like it used to.
I've also managed to get it to open correctly if I rename the file to have an .html suffix, and then choose "Web Page Query (Calc)" as the filetype when opening it. Leaving it with the .xls suffix it had when it was downloaded won't allow me to choose this filetype though.
As mentioned, this is a regression, it opened correctly in Calc in 4.0.x
The content of the file looks like this when opened in a raw text editor:
<table cellspacing="0" rules="all" border="1" style="border-collapse:collapse;">
<th scope="col">Column1Name</th><th scope="col">Column2Name</th><th scope="col">Column3Name</th>
Already confirmed with 18.104.22.168 at Bug 68903 comment 2.
(In reply to comment #0)
> In this case, selecting one of the Excel filetypes from the selector results
> in an error trying to open the file
Of course, since it's not an Excel file, just renamed HTML.
Dave, please attach a test file to this bug if possible.
(In reply to comment #2)
> Dave, please attach a test file to this bug if possible.
Should be reproducible with *any* HTML renamed to .xls
Dave, so true, but having a test file unifies the testing case and makes sure we receive identical results. Otherwise this bug can become very confusing very quickly. I think we can agree it's a good idea to avoid that, no?
Created attachment 92211 [details]
For the record, I included a test case in the initial description. You just had to copy/paste it into a file and put an .xls extension on the filename. :)
In fact, that appears to be exactly what Maxim did to create the file he attached. :)
So, this is unfortunately by design. HTML file format is a generic format not specific to a single application type. If you want an HTML file to be opened in Calc, you can open it from Calc's file open menu.
Having said that, since we do something similar for CSV, we could try to handle this as well. The right way to handle it is to have the HTML detection service inspect the file extension, and decide in which application to open the file in, the same way we do for CSV file format.
The caveat is that the HTML detection code is split between writer and calc currently. The necessary first step would be to combine these split html detection codes into a central place, then put in a fix there. In fact, the csv detection code used to be split in the writer and calc detection codes as well. I had moved that into where it is now. We should do the same for html.
Maxim Monastirsky committed a patch related to this issue.
It has been pushed to "master":
related: fdo#73682 Introduce HTML detection service
The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.
This is now fixed on master. Nice work Maxim!
*** Bug 76234 has been marked as a duplicate of this bug. ***
*** Bug 76114 has been marked as a duplicate of this bug. ***
Maxim, the problem is within 4.3.2 build as well (Windows)? I thought this was fixed for 4.3?
I change the extension to .html and "open with" CALC and it opens fine.
There are no patches out there for 4.3.2.
As a general rule NEVER reopen old bugs. Instead:
- Open a new bug.
- Attach a test document, that shows the bug. (The bug is mostly fixed, but there could be a problem with a specific file. Therefore I can't say anything without seeing the file you're using.)
- Add me to the CC list of that bug.