Bug 98110 - Convert-Script from XHP to MediaWiki modifications
Summary: Convert-Script from XHP to MediaWiki modifications
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Documentation (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Akash Deshpande
URL:
Whiteboard:
Keywords: difficultyBeginner, easyHack, skillPython
Depends on:
Blocks: 62292
  Show dependency treegraph
 
Reported: 2016-02-23 13:08 UTC by Dennis Roczek
Modified: 2017-02-14 08:57 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dennis Roczek 2016-02-23 13:08:47 UTC
Regarding Bug 62292 our Python script has to be modified which transforms our xml help files (xhp) to mediawiki content.

At the moment content of
<item type="keycode"><switchinline select="sys"><caseinline select="MAC">Command</caseinline><defaultinline>Ctrl</defaultinline></switchinline>+Tab</item> 

gets converted to 
{{KeyCode|CommandCtrl+Tab}}

which is simply wrong.

The best result would be something like
{{KeyCode|{{System|default=Ctrl|mac=Command}}+Tab}}

The Python script can be found at
http://opengrok.libreoffice.org/xref/help/to-wiki/wikiconv2.py
Comment 1 Akash Deshpande 2016-05-04 20:27:36 UTC
Hi, I will start looking at this.  I have been learning python for about a year and this will be my first task in Libreoffice project, if I can finish it.  This showed up as one of the tasks for easyHack/Python.
Comment 2 Akash Deshpande 2016-05-07 01:58:03 UTC
Even though the described problem shows up at https://help.libreoffice.org/Draw/Shortcut_Keys_for_Drawing_Objects, when I run help/help-to-wiki.py and inspect wiki/Draw/Shortcut_Keys_for_Drawing_Objects/MAIN, the problem is not there.  Please let me know if I am not looking in the right place.

Also, it took me a long time to find out why I was not finding these files when I built core as per the instructions.  Much later, I realized that help is separate and I may have to download it.

I will try to keep studying these programs, if there is something else I can get into.  It seems one way to quickly replicate the problem is to keep just one file in help/alltitles.csv and comment out running to-wiki/getalltitles.py.  Please let me know if there is a better way, if I have to continue to work on this.

If I am right, should I mark it as RESOLVED?  If the problem is still there, please do let me know how I can continue to investigate.

Thank You
Comment 3 Dennis Roczek 2016-05-07 02:10:28 UTC
Oh. I'm really sorry. I had to mentioned that. You have to build code + help files. 

so you need to your autogen
--with-help=de

or the like. best ask at the irc channel at reenode #libreoffice-dev (me is on the way to be)

the problem is still there: everything you see on the page help.libo.org is generated from the help files (xhp) converted by the named script.


> It seems one way to quickly replicate the problem is to keep just one file in
> help/alltitles.csv and comment out running to-wiki/getalltitles.py.  Please let me 
> know if there is a better way, if I have to continue to work on this.
Wthout checing the files (sry, I'm tired): no: the alltitles is simply a list of files which is uploaded.

Please try to generated the xhd files and then to debug with a phyton debugger where we do the conversation wrong.
Comment 4 Akash Deshpande 2016-05-07 02:50:58 UTC
Thank you for your answer.  I will keep investigating.  I hope it's not urgent.  I am traveling and will continue from Sunday evening.  I will spend 1-2 hours per day, more in the weekend and hoping to take more tasks in the summer.  If it is urgent, please feel free to assign to someone else and I can look for another assignment.

I am still learning git, git-review etc.
Comment 5 Dennis Roczek 2016-05-09 10:46:32 UTC
Take your time in fixing this ticket. If you realize that you cannot solve that ticket, or you simply have not enough time, please remove yourself from the assignment field.
Comment 6 Akash Deshpande 2016-05-10 01:46:17 UTC
Thanks for your help and comments.  I rebuilt the code, I now have helpcontent2 folder and I am now ABLE to reproduce the problem.

The confusion I had/have is that even though the wiki html page has the problem, the wiki page generated by the driver program help-to-wiki.py (which is not html, I guess the conversion happens somewhere else) does not show the problem.

For example, I am looking at the following page (one of three mentioned in the ticket 62292):
source/text/scalc/01/06130000.xhp;Calc/AutoInput;AutoInput

The following input:
<switchinline select="sys"><caseinline select="MAC"><item 
type="keycode">Command</item></caseinline><defaultinline><item type="keycode">Ctrl</item></defaultinline></switchinline><item type="keycode">+Tab</item> 
produces:
{{System|default={{KeyCode|Ctrl}}|mac={{KeyCode|Command}}}}{{KeyCode|+Tab}}

which I guess looks ok.  But, as I said, the wiki WEB page is bad.

I then edited the input file and pasted the exact content you mentioned is not handled properly, and I get the problem output {{KeyCode|CommandCtrl+Tab}}

I will focus on fixing this and study the code in detail, over the next few days.
Comment 7 Akash Deshpande 2016-05-14 21:56:58 UTC
I have submitted a patch for this: https://gerrit.libreoffice.org/#/c/25000/

I made a custom input file with the content as mentioned in the comment by Dennis to replicate the problem and made the changes.

Input:
<item type="keycode"><switchinline select="sys"><caseinline select="MAC">Command</caseinline><defaultinline>Ctrl</defaultinline></switchinline>+Tab</item> 

Now generates:
{{KeyCode|{{System|default=Ctrl|mac=Command}}+Tab}}

Since the Paragraph class handles switchinline and other objects properly, I subclassed Item from Paragraph to reuse its start_element method.  If this is not appropriate, please let me know.  I will be happy to rework as necessary.
Comment 8 jani 2016-06-14 05:56:12 UTC
A polite ping, still working on this issue ?
Comment 9 Akash Deshpande 2016-06-16 01:56:04 UTC
Hi Jan,

Thank you for the ping.  I am waiting for the review.  It's my first commit and it would be awesome to get it approved!  In case there is any feedback that needs to be addressed, I am busy with some exams for the next few days but will certainly be available after 29th or so.  I was also looking at 'making some unittests more pythonic' and studying git in more detail.. Akash
Comment 10 jani 2016-06-17 12:38:15 UTC
It seems this bug was solved, but the issue was not closed.

Your work however was good, and I look forward to see more work from you after your exams. One of my "duties" is to help contributors, so feel free to contact me.

Apart from the unit tests, we have other python tasks, and even a little project, where you need to do a bit of research and not only write predefined code

cross fungers for your exam.
rgda
jan i
Comment 11 Dennis Roczek 2016-06-18 11:54:43 UTC
@jani why wontfix? the wanted wikimarkup is still desired as the output on the helpwiki is horrible and confuses the normal user.
Comment 12 Andras Timar 2016-06-18 19:26:08 UTC
(In reply to Dennis Roczek from comment #11)
> @jani why wontfix? the wanted wikimarkup is still desired as the output on
> the helpwiki is horrible and confuses the normal user.

@Dennis, the output of the conversion script is correct, as I mentioned on the mailing list. help.libreoffice.org should be updated (which I'm doing right now). See for example https://help.libreoffice.org/Draw/Shortcut_Keys_for_Drawing_Objects. The CommandCtrl is not there. In fact the bug was solved by Christian Lohmaier, who fixed the invalid xhp files, instead of changing the conversion script. 

commit f9cb7f4660c039f7bf18bf60f6fb9c77f7e4f54b
Author: Christian Lohmaier <lohmaier+LibreOffice@googlemail.com>
Date:   Mon Jan 25 15:26:12 2016 +0100

    switchinline is no valid child of "item" either
    
    sed command used:
    's#\(<item[^>]*>\)\(<switchinline[^>]*><caseinline[^>]*>\)\([^>]*\)\(</caseinline><defaultinline>\)\([^<]*\)\(</defaultinline></switchinline>\)\([
^<]*</item>\)#\2\1\3</item>\4\1\5</item>\6\1\7#g'