Download it now!
Bug 123936 - Find files to be formatted with clang-format
Summary: Find files to be formatted with clang-format
Status: ASSIGNED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
6.3.0.0.alpha0+
Hardware: All All
: medium normal
Assignee: philipp.hofer
URL:
Whiteboard: target:7.0.0 target:7.1.0
Keywords: difficultyBeginner, easyHack, skillPython, skillScript, topicCleanup
Depends on:
Blocks:
 
Reported: 2019-03-08 07:30 UTC by Samuel Mehrbrodt (allotropia)
Modified: 2020-11-23 08:12 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Samuel Mehrbrodt (allotropia) 2019-03-08 07:30:57 UTC
We use clang-format to have consistent formatting on a number of files already. Although most files are currently blacklisted for formatting by 'solenv/clang-format/blacklist'.

The task is to find files where enabling clang-format would only cause minor reformatting (maybe less than 5% of the lines on a file).

Then this file should be removed from 'solenv/clang-format/blacklist' and be formatted with clang-format.

Ideally a (Python/Shell/...) script will be written which does this check for all files. But manually checking is also ok.

Please submit suitable files one by one to Gerrit to allow individual review.
Comment 1 Raghu Ram 2019-03-08 12:59:54 UTC Comment hidden (obsolete)
Comment 2 Raghu Ram 2019-03-08 13:04:29 UTC Comment hidden (obsolete)
Comment 3 Samuel Mehrbrodt (allotropia) 2019-03-08 13:52:26 UTC Comment hidden (obsolete)
Comment 4 Anuj Agrawal 2019-03-18 16:37:05 UTC Comment hidden (obsolete)
Comment 5 Samuel Mehrbrodt (allotropia) 2019-03-19 06:34:11 UTC Comment hidden (obsolete)
Comment 6 Samuel Mehrbrodt (allotropia) 2019-03-19 07:01:21 UTC Comment hidden (obsolete)
Comment 7 Samuel Mehrbrodt (allotropia) 2019-05-23 12:43:28 UTC Comment hidden (obsolete)
Comment 8 Stephan Bergmann 2019-12-17 13:25:47 UTC
But note that unconditional application of clang-format will, in general, break the meaning of comments.  Consider code like

  LooooooongType1 looooooongVarName1;
      // lengthy comment about some detail
  LooooooongType2 looooooongVarName2;

where the comment pertains to the preceding line, but needs to go on a new line due to line length limitations, and is indented to make that clear.  It will get reformatted as

  LooooooongType1 looooooongVarName1;
  // lengthy comment about some detail
  LooooooongType2 looooooongVarName2;

making it less obvious whether the comment pertains to the preceding or to the following line.
Comment 9 Commit Notification 2020-01-27 18:10:41 UTC
Batuhan Taskaya committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/09680b09b2f49e7a37d8b941822f053b5179bf6d

tdf#123936: bin/find-clang-format.py for finding files to be formatted

It will be available in 6.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Commit Notification 2020-01-28 13:38:22 UTC Comment hidden (noise)
Comment 11 Xisco Faulí 2020-07-22 15:22:17 UTC Comment hidden (obsolete)
Comment 12 philipp.hofer 2020-11-05 17:16:36 UTC Comment hidden (obsolete)
Comment 13 philipp.hofer 2020-11-05 20:41:53 UTC
I wrote a python script which calculates a similarity score between the original file and the clang-formatted files. 
During this process I found ~250 files which are listed in 'solenv/clang-format/excludelist' but do not exist (anymore).  Furthermore, ~2.500 (out of 18.434) files have a similarity score of >0.99; mostly missing spaces and really small modifications.

I think we should start by removing the files from 'excludelist' which don't exist anymore and update the ~2.500 files which do not change much after formatting it with clang.

In the first post you mentioned "Please submit suitable files one by one to Gerrit to allow individual review.". Although I've worked extensively with git in the past, it's my first time with Gerrit -- how do you expect my results?
Comment 14 Buovjaga 2020-11-06 05:24:21 UTC
Careful with needinfo status: https://wiki.documentfoundation.org/QA/Bugzilla/Fields/Status/NEEDINFO
Comment 15 Samuel Mehrbrodt (allotropia) 2020-11-06 09:29:58 UTC
(In reply to philipp.hofer from comment #13)
> I wrote a python script which calculates a similarity score between the
> original file and the clang-formatted files. 
> During this process I found ~250 files which are listed in
> 'solenv/clang-format/excludelist' but do not exist (anymore).  Furthermore,
> ~2.500 (out of 18.434) files have a similarity score of >0.99; mostly
> missing spaces and really small modifications.
> 
> I think we should start by removing the files from 'excludelist' which don't
> exist anymore and update the ~2.500 files which do not change much after
> formatting it with clang.

Sounds good.

> 
> In the first post you mentioned "Please submit suitable files one by one to
> Gerrit to allow individual review.". Although I've worked extensively with
> git in the past, it's my first time with Gerrit -- how do you expect my
> results?

Please submit 
* one patch (commit) for removing non-existant files from excludelist
* one patch per module for formatting the similiar files with clang-format (module == top-level directory like dbaccess, sw, sc, ...)
Comment 16 philipp.hofer 2020-11-06 15:27:14 UTC
I submitted two patches so far:
1) one patch (commit) for removing non-existant files from excludelist (https://gerrit.libreoffice.org/c/core/+/105407/1)
2) one patch for module accessibility for formatting the similar files with clang-format (https://gerrit.libreoffice.org/c/core/+/105411)

I'm waiting for feedback for 2) before I create 103 more patches (for the other modules).
Comment 17 Commit Notification 2020-11-06 20:06:13 UTC Comment hidden (noise)
Comment 18 Commit Notification 2020-11-10 11:45:06 UTC Comment hidden (noise)
Comment 19 Commit Notification 2020-11-13 07:11:23 UTC Comment hidden (noise)
Comment 20 Commit Notification 2020-11-13 07:16:36 UTC Comment hidden (noise)
Comment 21 Commit Notification 2020-11-13 07:17:49 UTC Comment hidden (noise)
Comment 22 Commit Notification 2020-11-13 07:19:01 UTC Comment hidden (noise)
Comment 23 Commit Notification 2020-11-13 07:20:14 UTC Comment hidden (noise)
Comment 24 Commit Notification 2020-11-13 07:21:25 UTC Comment hidden (noise)
Comment 25 Commit Notification 2020-11-13 07:21:36 UTC Comment hidden (noise)
Comment 26 Commit Notification 2020-11-13 07:22:48 UTC Comment hidden (noise)
Comment 27 Commit Notification 2020-11-13 07:24:00 UTC Comment hidden (noise)
Comment 28 Commit Notification 2020-11-13 07:24:11 UTC Comment hidden (noise)
Comment 29 Mike Kaganski 2020-11-13 07:45:43 UTC
(In reply to philipp.hofer from comment #16)
> I'm waiting for feedback for 2) before I create 103 more patches (for the
> other modules).

Please don't submit that many patches in large batches. It fills up the CI queue, and makes others' work impossible for days.

Sending say 3 patches, and waiting them to be finished before sending the next set, is OK, but not 103.

(In reply to philipp.hofer from comment #13)
> I wrote a python script which calculates a similarity score between the
> original file and the clang-formatted files. 

Please describe how the similarity is measured in the script (or maybe create a gerrit change with the script). That would help answer e.g. the question Stephan asked in https://gerrit.libreoffice.org/c/core/+/105655:

> How do you calculate the threshold which files to modify?  For example in
> configmgr/source/defaultprovider.hxx, affecting 5 out of 35 existing lines
> is clearly beyond a 5% line limit?
Comment 30 Mike Kaganski 2020-11-13 08:07:00 UTC
And we maybe should allow changes that may affect more than 5%, but allow using our .git-blame-ignore-revs to reduce the blame noise. (Then the task would be to amend the list there.)
Comment 31 Tor Lillqvist 2020-11-13 08:19:05 UTC
> Please submit suitable files one by one to Gerrit to allow individual review

But please don't have any more than around ten outstanding changes for this bug in Gerrit at any time. Don't submit a hundred such minimal changes. Sadly the Gerrit and Jenkins (never fully understood what to call it) infrastructure apparently has no mechanism to prioritise work based on "importance". I think we can all agree that this "bug" that is about cosmetic changes is less important than bugs that are about actual erroneous behaviour in the software, or functionality enhancements, right?
Comment 32 philipp.hofer 2020-11-13 08:22:23 UTC
(In reply to Mike Kaganski from comment #29)
> (In reply to philipp.hofer from comment #16)
> > I'm waiting for feedback for 2) before I create 103 more patches (for the
> > other modules).
> 
> Please don't submit that many patches in large batches. It fills up the CI
> queue, and makes others' work impossible for days.
> 
> Sending say 3 patches, and waiting them to be finished before sending the
> next set, is OK, but not 103.
> 
> (In reply to philipp.hofer from comment #13)
> > I wrote a python script which calculates a similarity score between the
> > original file and the clang-formatted files. 
> 
> Please describe how the similarity is measured in the script (or maybe
> create a gerrit change with the script). That would help answer e.g. the
> question Stephan asked in https://gerrit.libreoffice.org/c/core/+/105655:
> 
> > How do you calculate the threshold which files to modify?  For example in
> > configmgr/source/defaultprovider.hxx, affecting 5 out of 35 existing lines
> > is clearly beyond a 5% line limit?

Sorry for causing inconvenience, was definitely not my plan. I will only submit a few patches from now on. How do we continue in this particular situation with having so many patches? 


I used Python's difflib.SequenceMatcher and changed all files with > 99% similarity.
Comment 33 Mike Kaganski 2020-11-13 08:34:46 UTC
(In reply to philipp.hofer from comment #32)
> Sorry for causing inconvenience, was definitely not my plan. I will only
> submit a few patches from now on. How do we continue in this particular
> situation with having so many patches? 

Np, here was no warnings/precautions here to let know how to avoid that, so it's not your fault. For the changes that were aborted in Jenkins, please wait next week, and then *rebase* 2 or three of them to be built.
Comment 34 Christian Lohmaier 2020-11-13 12:12:06 UTC
A simple solution to avoiding "spamming" the bots with too many build verification results, is to upload your individual changes as drafts/WIP, those are ignored by the gerrit-trigger in jenkins.

And for build-testing, create a *single* commit where you have multiple of the easy-to-handle smaller-chunks combined.

e.g. cherry pick 10 or so for verification, and git rebase -i them and use squash option to combine them into a single commit.
Comment 35 Buovjaga 2020-11-13 12:40:40 UTC
(In reply to Christian Lohmaier from comment #34)
> A simple solution to avoiding "spamming" the bots with too many build
> verification results, is to upload your individual changes as drafts/WIP,
> those are ignored by the gerrit-trigger in jenkins.

https://wiki.documentfoundation.org/Development/gerrit/SubmitPatch#Submitting_patches_as_private_or_work-in-progress

./logerrit submit-wip master
Comment 36 Commit Notification 2020-11-13 13:27:45 UTC Comment hidden (noise)
Comment 37 Commit Notification 2020-11-13 13:29:58 UTC Comment hidden (noise)
Comment 38 Commit Notification 2020-11-13 13:32:11 UTC Comment hidden (noise)
Comment 39 Commit Notification 2020-11-13 14:12:12 UTC Comment hidden (noise)
Comment 40 Commit Notification 2020-11-13 14:12:23 UTC Comment hidden (noise)
Comment 41 Commit Notification 2020-11-13 14:13:35 UTC Comment hidden (noise)
Comment 42 Commit Notification 2020-11-13 14:13:47 UTC Comment hidden (noise)
Comment 43 Commit Notification 2020-11-13 14:14:59 UTC Comment hidden (noise)
Comment 44 Commit Notification 2020-11-13 14:15:10 UTC Comment hidden (noise)
Comment 45 Commit Notification 2020-11-13 14:15:21 UTC Comment hidden (noise)
Comment 46 Commit Notification 2020-11-13 14:15:32 UTC Comment hidden (noise)
Comment 47 Commit Notification 2020-11-13 14:15:43 UTC Comment hidden (noise)
Comment 48 Commit Notification 2020-11-13 14:16:55 UTC Comment hidden (noise)
Comment 49 Commit Notification 2020-11-13 14:17:06 UTC Comment hidden (noise)
Comment 50 Commit Notification 2020-11-13 14:17:17 UTC Comment hidden (noise)
Comment 51 Commit Notification 2020-11-13 14:18:28 UTC Comment hidden (noise)
Comment 52 Commit Notification 2020-11-13 14:19:40 UTC Comment hidden (noise)
Comment 53 Commit Notification 2020-11-13 14:19:50 UTC Comment hidden (noise)
Comment 54 Commit Notification 2020-11-13 14:20:02 UTC Comment hidden (noise)
Comment 55 Commit Notification 2020-11-13 14:20:13 UTC Comment hidden (noise)
Comment 56 Commit Notification 2020-11-13 14:21:25 UTC Comment hidden (noise)
Comment 57 Commit Notification 2020-11-13 14:21:36 UTC Comment hidden (noise)
Comment 58 Stephan Bergmann 2020-11-16 08:16:23 UTC
(In reply to philipp.hofer from comment #32)
> I used Python's difflib.SequenceMatcher and changed all files with > 99%
> similarity.

...which is IMO not necessarily a useful metric here, where tooling is generally line-based.  For example, the "less than 5% of the lines on a file" from comment 0 would ensure that (line-based) `git blame` would have a low chance of hitting random-reformat noise.  (And `git blame --ignore-revs-file` is not a generally useful countermeasure against the induced noise, given the issues caused by reformatting of comments, see comment 8, which you might not want `git blame` to paper over.)
Comment 59 Commit Notification 2020-11-16 10:43:27 UTC Comment hidden (noise)
Comment 60 Commit Notification 2020-11-16 10:45:38 UTC Comment hidden (noise)
Comment 61 Commit Notification 2020-11-16 10:46:50 UTC Comment hidden (noise)
Comment 62 Commit Notification 2020-11-16 11:26:20 UTC Comment hidden (noise)
Comment 63 Commit Notification 2020-11-16 11:26:31 UTC Comment hidden (noise)
Comment 64 Commit Notification 2020-11-16 11:30:44 UTC Comment hidden (noise)
Comment 65 Commit Notification 2020-11-18 11:05:28 UTC Comment hidden (noise)
Comment 66 Commit Notification 2020-11-18 11:05:39 UTC Comment hidden (noise)
Comment 67 Commit Notification 2020-11-18 11:15:54 UTC Comment hidden (noise)
Comment 68 Commit Notification 2020-11-18 11:29:10 UTC Comment hidden (noise)
Comment 69 Commit Notification 2020-11-18 11:29:21 UTC Comment hidden (noise)
Comment 70 Commit Notification 2020-11-18 11:29:33 UTC Comment hidden (noise)
Comment 71 Commit Notification 2020-11-18 11:30:45 UTC Comment hidden (noise)
Comment 72 Commit Notification 2020-11-18 11:30:56 UTC Comment hidden (noise)
Comment 73 Commit Notification 2020-11-18 11:31:06 UTC Comment hidden (noise)
Comment 74 Commit Notification 2020-11-18 11:31:17 UTC Comment hidden (noise)
Comment 75 Commit Notification 2020-11-18 11:32:28 UTC Comment hidden (noise)
Comment 76 Commit Notification 2020-11-18 11:55:51 UTC Comment hidden (noise)
Comment 77 Commit Notification 2020-11-18 11:57:01 UTC Comment hidden (noise)
Comment 78 Commit Notification 2020-11-19 00:33:57 UTC Comment hidden (noise)
Comment 79 Commit Notification 2020-11-19 00:38:11 UTC Comment hidden (noise)
Comment 80 Commit Notification 2020-11-19 00:38:23 UTC Comment hidden (noise)
Comment 81 Commit Notification 2020-11-19 09:44:56 UTC Comment hidden (noise)
Comment 82 Commit Notification 2020-11-19 09:45:08 UTC Comment hidden (noise)
Comment 83 Commit Notification 2020-11-19 09:45:19 UTC Comment hidden (noise)
Comment 84 Commit Notification 2020-11-19 09:45:29 UTC Comment hidden (noise)
Comment 85 Commit Notification 2020-11-19 13:46:56 UTC Comment hidden (noise)
Comment 86 Commit Notification 2020-11-19 13:48:08 UTC Comment hidden (noise)
Comment 87 Commit Notification 2020-11-19 13:48:19 UTC Comment hidden (noise)
Comment 88 Commit Notification 2020-11-19 18:22:29 UTC Comment hidden (noise)
Comment 89 Commit Notification 2020-11-19 18:23:40 UTC Comment hidden (noise)
Comment 90 Commit Notification 2020-11-19 18:23:51 UTC Comment hidden (noise)
Comment 91 Commit Notification 2020-11-19 18:24:01 UTC Comment hidden (noise)
Comment 92 Commit Notification 2020-11-19 18:25:15 UTC Comment hidden (noise)
Comment 93 Commit Notification 2020-11-19 18:25:35 UTC Comment hidden (noise)
Comment 94 Commit Notification 2020-11-19 18:25:50 UTC Comment hidden (noise)
Comment 95 Commit Notification 2020-11-19 18:26:03 UTC Comment hidden (noise)
Comment 96 Commit Notification 2020-11-19 18:27:18 UTC Comment hidden (noise)
Comment 97 Commit Notification 2020-11-19 18:27:29 UTC Comment hidden (noise)
Comment 98 Commit Notification 2020-11-19 18:29:40 UTC Comment hidden (noise)
Comment 99 Commit Notification 2020-11-19 18:29:51 UTC Comment hidden (noise)
Comment 100 Commit Notification 2020-11-21 12:16:36 UTC Comment hidden (noise)
Comment 101 Commit Notification 2020-11-21 12:16:47 UTC Comment hidden (noise)
Comment 102 Commit Notification 2020-11-21 12:17:59 UTC Comment hidden (noise)
Comment 103 Commit Notification 2020-11-21 12:19:11 UTC Comment hidden (noise)
Comment 104 Commit Notification 2020-11-21 12:19:21 UTC Comment hidden (noise)
Comment 105 Commit Notification 2020-11-21 12:20:33 UTC Comment hidden (noise)
Comment 106 Commit Notification 2020-11-21 12:20:43 UTC Comment hidden (noise)
Comment 107 Commit Notification 2020-11-22 00:56:23 UTC Comment hidden (noise)
Comment 108 Commit Notification 2020-11-22 00:56:34 UTC Comment hidden (noise)
Comment 109 Commit Notification 2020-11-22 00:57:45 UTC Comment hidden (noise)
Comment 110 Commit Notification 2020-11-22 00:57:56 UTC Comment hidden (noise)
Comment 111 Commit Notification 2020-11-22 00:58:07 UTC Comment hidden (noise)
Comment 112 Commit Notification 2020-11-22 00:58:18 UTC Comment hidden (noise)
Comment 113 Commit Notification 2020-11-22 00:59:28 UTC Comment hidden (noise)
Comment 114 Commit Notification 2020-11-22 00:59:39 UTC Comment hidden (noise)
Comment 115 Stephan Bergmann 2020-11-23 08:12:26 UTC
(In reply to Stephan Bergmann from comment #58)
> (In reply to philipp.hofer from comment #32)
> > I used Python's difflib.SequenceMatcher and changed all files with > 99%
> > similarity.
> 
> ...which is IMO not necessarily a useful metric here, where tooling is
> generally line-based.  For example, the "less than 5% of the lines on a
> file" from comment 0 would ensure that (line-based) `git blame` would have a
> low chance of hitting random-reformat noise.  (And `git blame
> --ignore-revs-file` is not a generally useful countermeasure against the
> induced noise, given the issues caused by reformatting of comments, see
> comment 8, which you might not want `git blame` to paper over.)

So this has nevertheless carried on with that IMO unuseful metric?  So be it.

This issue seems to be done now?  Should we close it?

Will somebody update .git-blame-ignore-revs?