2020 November CodeRefinery HackMD, day 3
- Course page: https://coderefinery.github.io/2020-11-17-online/
- Schedule: https://coderefinery.github.io/2020-11-17-online/#schedule
Icebreaker, Day 3
What operating system and what display setup do you have?
||2 monitors + laptop screen
||laptop + 27” monitor (1440p)
||laptop + extra screen
||desktop(!) + 2 screens
||laptop + extra screen
||laptop + extra screen
||laptop + 2 extra screens
Feedback/suggestions from yesterday
Few questions arrived after the workshop and we have (hopefully) answered all questions here: https://coderefinery.github.io/2020-11-17-online/hackmd-day2/
Any other questions that you would like to have clarified?
- when to open a repository as private and when as public? when you open as public, can part of the code that is being developed still be private to a group of collaborators?
- I assume this is related to when a repository is on a public website (e.g. gitlab, github etc.) Usually a private repository means the code is not made public, it contains some information that should not be shared “on the internet” or it is not yet ready to be open (e.g. work in progress). A repository can be made public once work is considered at at status where it can be made public. However be sure to clean the commits of any private information in the git history or be sure to keep that information separate if there is ever a plan to make the private repo public at some point.
- I could think of another two reasons for private repos
- For companies with commercial interest. e.g. there is a public version and a Googles version of Andriod
- for this usually a companies have internal git repositories to keep them private
- The License agreement is not ready yet
- Not sure there is such a functionality to make part of github or gitlab repository private
- In Git either entire repo is public or entire repo is private. But what can be useful is to use two repos (one public, one private; we will today learn how) and put the unpublished stuff on the private repo, and share the master branch, which only contains published things, on the public repo, to give others the chance to base their work on the public version.
- In companies (at least where I worked) we had two repositories: one public for public releases and one private for internal development. We populated the public from the private whenever we wanted and with only the features we wanted.
Git-bisect (10 minutes before we start with collaborative Git)
- As a Windows user, which tasks should one do in the Git bash vs Anaconda prompt, in general? I’m a little confused as to why he had to switch over to anconda prompt?
- It is possible to set up your Git Bash to do everything in there but often, depending on installation, Git Bash cannot see the Python installation. If you use Python in your “daily work” and would like to get this configured, we can do that together. If you don’t use Python, then what you can do is to have two terminals open (Anaconda Prompt and Git Bash) and do all the git stuff in the latter and run the very few commands which will require Python in the Anaconda one. This is how one can make Python visible in Git Bash: https://coderefinery.github.io/installation/troubleshooting/
- there is also information here: https://docs.anaconda.com/anaconda/user-guide/faq/#installing-anaconda (“Should I add Anaconda to the Windows PATH?”)
- I do use Python in my “daily work”, and just run scripts from Spyder. I currently use git from the command line. Is this something that I should configure differently then?
- depending on what is your default “daily work” python (e.g. 2.7<– obsolete - some systems still have that as default) and if you can change that to another python version (e.g. 3.8) without affecting your work, then the recommandation is to follow the links above.
- we can also debug this later. also, Spyder has git integration so you may prefer to do Git directly from there. but if you like to work in the command line (I personally prefer that), then we can fix this together later.
- Git bisect clarification- you are running the code and checking manually at each step to see if it works the way you want?
- yes, but this can be scripted (we also provide an example). you can write a script (any language) which decides whether the code is working (by returning 0/non-0 return code) and then git bisect can also use that.
- If you have found the commit where something broke. How do you then repair that so that it ends up working again at the end in the master?
- once we know the commit, and assuming the commit is not too huge, often just by looking at the commit we can see why something broke and it can help debugging. the recovery then could be either reverting that problematic commit, or apply a new change at the tip of the main/master branch, undoing the mistake. possibly using a “manual” commit.
- Did I miss something ? At the end of bisect,
git show shows changes which seem to not cause any change of the actual value. It just adds and innocent comment.
- which commit did you locate?
- Commit 136 which just adds a comment at the end of t0. It resembles the output shown by Matus.
- the bad commit is
git show 326f68a5585, so the one right after. I gues maybe you stopped one step too early.
- Ok, resolved. I understand. At the end of
git bisect the bisect ends at the last good commit (i.e. 136) in order to check what went wrong… (This statement is wrong. Check comment below.)
- no it ends at the first bad commit. but just before arriving there, it shows “0 steps left” and I think it can trick the person/presenter to not do one more step. after seeing that we should have done a “git bisect good” and git bisect would have shown us the bad commit (sorry I wasn’t following the video then so I am not 100% sure but I think this is what happened)
- I checked the work flow twice. Even after doing
git bisect good after it shows “0 steps left”, the commit is still at 136 and
git show still shows changes with commit 135 and not 137 (which is the bad one). Can you please confirm this ? I actually discovered
git bisect now and I think its very interesting. I just want to know exactly how to use it.
- checked (see below, I go until it shows me the bad one)
- That’s correct. I was talking about
git show at this point.
- Aha! I was assuming everybody does
git show 326f68a55 :-) Should we clarify the material for this? :+1:
- :-) Thanks for clarifying. I understand how
git bisect works now !
- great! it’s super valuable. not something we need every day but something I need 2x a year and then it saves me always a day or two of work.
- [name=staff] In my demo, the bisect stopped indeed at the first bad commit (137), but the HEAD was at the last good (136). So
git show showed the last good commit (136).
- ok, sorry, i will modify my answers above :-)
~/tmp/git-bisect-exercise on dac3c42
$ git bisect good
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[41af86cd42a966e4375b68bbad4af1384a0558cf] commit number 136
~/tmp/git-bisect-exercise on 41af86c
$ python get_pi.py
~/tmp/git-bisect-exercise on 41af86c
$ git bisect good
326f68a558501a6f44d7685c2c1795794bac09b5 is the first bad commit
Author: staff email@example.com
Date: Fri Mar 29 16:02:52 2019 +0100
get_pi.py | 8 ++++—-
1 file changed, 4 insertions(+), 4 deletions(-)
- Why do we not stop after the first git bisect good, I mean why check all the history?
- because we wanted to find the first commit which changed the behavior from “good” to “bad”. if we had stopped at the first “good”, we would have found a working version, but not the commit that broke it. knowing the commit that broke it is useful if 1) you already published the paper and want to know whether this was before or after the problem, and 2) it can simplify debugging if you see the bad change, often it’s just one line that changes the behavior.
- but in the example, it kept alternating, it seemed as if the issue happened multiple times in history then been reverted? Maybe I am missing something here.
- we did not traverse the history “following the time arrow”, we gave git bisect two endpoints, then we halved the history, then we halved it again, then again. so we were jumping around the problematic commit. this was to minimize the number of steps.
- ah ok! Got it, thanks
- A good way of getting to understand it better, is trying the exercise yourself. In every step, you’ll see at which commit in history we are, as for this exercise, the commits are “named” including a sequencial number (1, 2, 3, …)
- indeed if we went one commit after another, we should have stopped at the first “good” (but then we would have used 136 or 364 steps (depending on which direction we’d go), whereas with bisect we used only 9 steps of checkout+test to find it)
Collaborative distributed version control
Break: (until xx:25)
Resuming at step 9: discuss and accept pull request
- What happens in the local side, if a pull request is rejected?
- rejecting can mean not merging a pull request or closing a pull request. but in either way, nothing changes locally, closing the pull request/ merge request does not delete the source branch (the branch from which the pull request was sent from) and does not delete any commits.
- What happens when the merge to master happens in the local reppository, and then the developer pushes the changes to github, and then the pull-request is rejected? I imagine that the remote and local repositories will have a somehow inconsistent state, correct? What happens in the next
git fetch or
- for simplicity let’s first assume the PR (pull request) was not rejected but is still open: if you merge locally and push the merge commit, also the pull request will change to “merged” and it will look like if we had merged it via web. I don’t know what happens if the PR has been rejected and you push the merge anyway. Let me try this quick: This is how it looks: (link redacted) (although I closed/”rejected” it earlier, it looks like it has been merged)
- if a PR is rejected and closed nothing will happen, if it is just rejected changes might be required, not sure changes can be pushed if a rejected is there or there os a block by github from merging it.
- github does not block me from pushing the local merge, see discussion/example above
- if not set, yes, that is why one needs to set protected branches so that you don’t push rejected Pull requests :)
- also that example does not reject it, and maybe the terminology is important here but “reject” to me means changes were requested, “closed” means it is closed, changes can still be pushed to the branch associated to that Pull request, even if the branch is deleted then probably changes can still be pushed, although not ideal
- see https://docs.github.com/en/free-pro-team@latest/github/collaborating-with-issues-and-pull-requests/reviewing-proposed-changes-in-a-pull-request vs https://docs.github.com/en/free-pro-team@latest/github/collaborating-with-issues-and-pull-requests/closing-a-pull-request
- probably an example will help to illustrate what you mean it is not 100% clear
- but also I don’t think the above situation happens often in practice. I would write-protect the master/main branch and then this situation “cannot” happen and all merges then have to go via pull requests.
- A comment: I think it would be useful to clarify that GitHub offers services on top of git E.g. the pull-request is not part of git, but of github
- very good point. pull requests/ merge requests are built on top of git branching and merging. git itself “does not know” about pull requests or issues.
- Neither about forks! :) They are simply clones stored in the “cloud”
- indeed. forks are copies from one userspace/organization to another. under the hood, they are clones, not from cloud to computer but from cloud to cloud.
- Does it exists an upper number (like max) for the storage space or number of repositories you can be a part of?
- there is an excellent answer on stackoverflow to this question: https://stackoverflow.com/questions/38768454/repository-size-limits-for-github-com#:~:text=Repositories%20have%20a%20hard%20size,also%20apply%20for%20large%20pushes.&text=Files%20can%20also%20be%20shared,such%20file%20is%202%20GB#answer-59479166
- there are hard and soft limits. It is recommended to keep repositories small (1-5 GB). The push size limit is 2GB. Files are limited to 100MB in size.
- this is another good reason for code review: somebody else can check whether I am accidentally adding a 200 MB file to the history. technically we can remove commits from the history but organizationally this can be non-trivial.
- Does anyone know about the push/pull options for GIT with Jupyter Notebook/lab?
- There is the nbstripout plugin which allows to remove the output of cells before you commit. It simplifies the workflow with Jupyter. You can
pip install nbstripout, and in your
$HOME/.gitconfig you can add:
clean = python3 -m nbstripout
smudge = cat
textconv = python3 -m nbstripout -t
- See also https://pre-commit.com/ for a collection of many tools to quality control and validate commits. Many are useful to validate syntax, code style and prevent you from adding files or content that you shouldn’t (e.g. a password).
- What is gitverse? is there any such thing as gitverse or related one? or universe ?
- This word might be used to refer to the ecosystem of tools built around
git. I wouldn’t worry about these terms.
- where did this term show up?
- Can we close multiple issues with one PR. For example, “closes #N #M”?
- “I did this and that, closes #1, closes #2, closes #13”
- or a multiline commit message and you can list all the closes line by line
- note that if you forget to close issues in the commit, you can also close them in the pull request/ merge request
- How do I sync my fork with the central repo now?
- we will demonstrate this: the short route is https://coderefinery.github.io/git-collaborative/03-distributed/#shorter-route, there is also a longer way, just above
- important to realize that forks do not automatically update themselves, we update them via our local clones by pulling from one remote (central) and pushing to another remote (the fork)
- also good to know that instead of using shortcuts like “origin” you can refer to the full URLs instead
- Is it also possible to just keep my forked repo out-of-date in the long term, and instead only pull from the original repo? Or could that cause issues?
- it is ok to keep it out of sync but I would update it before I start working on something new to make sure that the new feature branch is not based too far in the past and to avoid conflicts at the moment when I send the pull request few weeks later. I often don’t even bother updating the main branch on fork since I don’t work on it anyway.
- Yes. I meant pulling master from the original repo, but pushing only my feature branch to the fork
- Right. I think this is fine and this is how I often use it: you do the work locally anyway, and then the fork is only a “parking lot” for open pull requests.
- there were no option to create issues from forks “forked” by helper. However, a collaborator when used the main repo branch as a template and “generated” from the main repo, then issue could be created.
- a fork will by default not carry own issues but you can enable issues on a fork (settings). but it is often less confusing to have all issues only in one place, in the central repository.
- for the generated repo it worked because there the goal is different: this is often used to generate repositories where there is no aim at bringing changes back to the template and then it makes sense to track their own issues
- I created a new branch and wanted to upload that to my repo and then make a pull request, but the new branch does not appear while git in the shell says “Everything up to date” after a push. No the branch is missing.
- check with
git status or
git graph whether the branch you want to push contains any new changes. But still the branch should show up on GitHub. It doesn’t?
- maybe we can look at it after via screenshare?
git push -u origin <name_of_branch>
- it should still show up, also without
- I tried the just now, it still does not show up.
- before I tried ‘git push origin -u branch_name’
- let’s look at it in a breakout room?
- please let me know if this is still an issue
- it kind of did something now, but I still don’t understand.
- I can explain here, perhaps that will be more clear.
- I managed to push the new branch and file to my repo. And then made a pull request, which was accepted. But once the PR was accepted, the file directly appeared in the upstream repo, my branch vanished. This is unexpected. Is this normal behaviour for github ?
- the vanished branch is surprising. can you paste a link to the pull request? I will remove the link again before we publish these notes.
- the link is: (link redacted)
- the branch is still there, it stays on your fork until you delete the branch. the merged pull request neither removes the branch, nor does it actually modify your fork
- my branch is still in my repo yes, but why does the pull request not conserve the branch in the upstream repo ?
- the branch was created in your fork, not in the upstream repo, so it never existed in the upstream repo (for full completeness this is not 100% correct but this is a technical detail of how github stores things but I think it is useful to imagine that the branch never existed there)
- does this mean that only the owner or a collaborator can create a branch in that repo ?
- right, this is correct. otherwise anybody could “vandalize” my own repos without asking me.
- It would still be a pull request …
- and this way I have the choice to accept what goes in
- Yes but can a pull request create a branch ? That is my question.
- the PR does not create a branch, it originates from a branch and once merged, it creates a commit, but does not create a branch
- I see, is this mentioned in the tutorial ? I might have missed it, but if it is not there, it would be good to point this out, to avoid confusion.
- thanks: https://github.com/coderefinery/git-collaborative/issues/162
- I have a related questions, the first issue was due to the fact that the git setup by default uses http login which never worked in the git bash, although sometimes it said “Everything up to date”.
- Is there a way tell git to setup every new repo to use ssh ?
- I think it remembers the last choice and will default to that
Feedback for day 3
Please write down one thing you liked and one thing we should improve.
- Having multiple upstreams around form the start is nice! Thanks for teaching GitHub workflow
- The pictures describing what happens locally/remote, makes it easier to understand!
- It was very helpful to first learn how things work conceptually, then connect it to code, and then practice with the code (and concepts!) on our own. This order is important because it helps to actually understand what you are doing while coding (in contrast to coding as fast as possible to keep up and then not really knowing what you actually did…).
- Great exercises on collaborative workflow. They really get discussions going, and learners learn a lot from each other.
- It might be nice to directly see what we are supposed to do from the tasks: like some tag for
do-yourself etc. Was always a bit confusing with “helper does A-C, you do D-E, F and G will be done in zoom”
- good point, we need to make this clearer
- Very simple code that is expected to be known, might not be known to everyone. So things like ‘step out of your directory’ are total gibberish to people who don’t know git bash. It would be very helpful if such simple commands (cd ..) are included in the mark down after such statements (between parentheses or something), to help out the people that have no clue. This also helps people not to get lost, get behind, or feel like total noobs.
- thanks! good suggestion. one slight complication is that some of these “simple” commands are sometimes different on different shells (which can increase confusion) but still we should make this clearer, perhaps offering different tabs. but it is super important that we avoid giving learners the feeling that they are “noobs” so we need to do something about this.
If you’re not editing this hackmd document, please go to view mode by clicking the eye button :eye: - you can then go back to edit mode when you want to edit.
Always ask questions at the very bottom of this document, right above this. Switch to view mode if you are only watching.
We are monitoring this hackMD, but we will reply every now and then so that you can focus on the speaker.