PRs and granularity

(Jay Pfaffman) #1

Say, hypothetically, after several days of ugly hacking, I’ve made a bunch of elegant changes to mbox.rb . Things like

  • BUG: make topic post be the first message in thread rather than the last
  • FEATURE: replace email addresses in posts with @username
  • UX: made it clearer where the script looks for files
  • UX: improved error checking and reporting
  • FEATURE: added gsubs to “correct” bogus email addresses (to probably wrong, but legal addresses)
  • FEATURE: add optional code to import user bios from external database
  • FIX: remove only [LISTNAME] from Subjects, not all [bracketed text]
  • FEATURE: set default trust_level on imported users

(Whether comments and top-of-file constants are UX is debatable.)

These are all in a separate branch with a bunch of commits, some of which actually have descriptive commit strings and changed something small. I still need to pull back out stuff that doesn’t belong in core (e.g., gsubs that fixed address problems in only my data set). All of the changes are in a single file (plus maybe one or two tobase.rb) and this is just an import script, not affecting installed instances.

What do I do now? Return to master and retrace my steps to add in pieces to individual branches as if I knew what I was doing when I started? Submit the whole thing and let someone else sort it out? Something in between? Is there some git magic that would make the process easier?

(Robin Ward) #2

In this case all the commits are against one file, so I’d suggest squashing them all into one commit and making a PR with bullet points that explains what changed. It needn’t go through every line but the general gist would be a good thing for historical purposes.

Also since it’s an importer and those are largely untested it’s a lot safer - if you break it the next person who does an import will notice but it’s not going to break any discourse sites in production.

(Jay Pfaffman) #3

Thanks. That’s music to my ears! That’s what I thought, but I’m still new to Git and the project, and after hacking on this script for a week, I almost know Ruby.

I’ll clean it up and submit it Real Soon Now.

(Matt Palmer) #4

On the “git magic” side of things, if you have the same sort of OCD I have and wanted to split out the changes into separate commits, as long as each logical change was in a separate “hunk” (sections of code delimited by lines starting with @@), you can do:

  1. Get a second branch with your code on it (so you’ve still got the original commits easily available if you make a mess):

     git checkout -b mbox-haxx-take-2
  2. “Uncommit” all your changes in all your commits:

     git reset master

    (Specify something other than master if your branch was made from another base commit, or if you’ve pulled into master since making your branch; if you know how many commits you’ve made, you can use HEAD^^^^^ where each ^ is one commit)

  3. Now if you run git status, you should see that mbox.rb is shown as having “Changes not staged for commit”. Excellent…

  4. Run git add -p. This will present you with a single “hunk”, and ask if you wish to stage it for commit. When you say y or n, it’ll go on to ask you about the next hunk, and the next, and so on. The plan here is to say y to all hunks that are logically associated with each other, and say n to all other hunks. If you make a mistake, you can q out of the whole process, run git reset HEAD path/to/mbox.rb to “unstage” all the hunks you said y to, and re-run git add -p.

  5. Once you’ve said y to a bunch o’ hunks, you can review the set of hunks you’re about to commit with git diff --cached. If they look good, then make a commit with git commit (if you’re used to putting a -a on your git commit, now is the time to unlearn that habit).

  6. Repeat the git add -p; git diff --cached; git commit dance several times until you’ve committed all the hunks in separate, well-described commits.

You can also use git rebase -i to reorder and squash commits into fewer, larger commits, but that’s a fairly ninja process with a reasonably high probability of getting yourself very, very confused. I only reach for the git rebase -i when I’m in dire straits, myself.

(Jay Pfaffman) #5


That’s the kind of waste-of-time-with-computers nonsense I was talking about! Since almost all I know about Ruby I have learned from reading code, my hack-wildly-until-something-good-happens approach will likely land me in this situation again, and perhaps on code that’s not all in one file.

I do want to try to figure out how to be a better Git citizen, so I’ll see if I can make some of those incantations work. I poked a bit at the Google and Stackoverflow, but this looks like exactly what I wanted.

:thumbsup: :thumbsup:

(Matt Palmer) #6

I have a Ph.D in wasting-time-with-computers from manpage University…

(Jay Pfaffman) #7

I should arguably post this at Stack Exchange, but this community’s practices are those that I’m interested in following and I already started this thread. . .

OK, so I spent several hours today trying to do something like the above on 30 minutes of code changes in discourse-setup. I figured out that I can get Emacs to do what you’re doing with git add -p and managed to submit let discourse-setup install docker and make memory tests work.

If I were to do that again, would the recommended course be to do this?

  1. git checkout master
  • do the changes for the First Thing
  • git checkout -b first-thing
  • git push master first-thing
  • check checkout master
  • do changes for second thing
  • git checkout -b second-thing
  • git push master second-thing

And I suppose if I were clever, I could then merge first-thing and second-thing in my own fork if I wanted.

(Matt Palmer) #8

Yes, putting separate changes on separate “feature branches” (as they are called) is generally considered to be The Way, with an “integration branch” to merge all your various things together if you want to run a local fork while waiting for mainline to catch up and merge all your PRs.