I can't get the link "Very detailed 2-page git cheatsheet" to work on this page: https://coderefinery.github.io/git-intro/reference/ I just get "404 File not found". Thanks for your help!
Could you clarify the concepts of the branches combined with having your local repository, your forked repo and the central repo? For example, all of them have a "master" branch, so if I talk about the master branch, which one am I referring to? And where should mergers happen? All these things weren't properly discussed last week.
git pull
, you get those changes and merge them. This merging happens on your computer.git push
to get your own changes to your online repository. This merge happens online (so it only works if the computer can do it. No conflicts allowed. You must pull first.)git push
them to the online repository. This merge happens online. (so no conflicts allowed, you must pull first.)git pull
(technically git fetch
and git merge
, actually). This merge happens on the main repository. Again, no conflicts allowed.https://coderefinery.github.io/reproducible-research/
Could you please elaborate on this "reproducibility crisis"? How did it start?
Man I wish people were less selfish, I totally believe that science should be more collaborative and "for the good of all". Should we maybe redesign the funding and publishing systems? :D (+1)
At least in bioinformatics it seems to me that people always believe they cannot use the programs/codes created by others, their own version would be better (at least 2-3 guys/girls I met), is that a field-specific attitude or is it in reality that many times the analyses for a specific biological question are just too "specific" (for the lack of a better word :D)?
Slightly off topic; do you have any environments for Python you recommend working in? I currently use Jupyter Lab on Mac. (+1)
This professor still knows more than most. :) Many others would just assume everything is available and not doubt there may be issues.(+2)
In each sub-folder there should be a different .git
repository? Or one .git
repository for the entire project?
Are there standard files that should be in .gitignore
?
.gitignore
object files and executables that are generated by the compiler. For e.g. Fortran one would like to have *.mod
, *.o
in the .gitignore
. Do you know of any version control for HARDWARE developments? (e.g, CAD or PCB designs)
Are there other resources one can use to share data that is too big to be shared on GitHub?
The files that we do not want to track should we then put them in the ./gitignore folder ?
.gitignore
is a text file. There you can specify the types of files you do not want to be tracked by git (e.g. *.zip
).How to use overleaf with git? I don't use it although use overleaf a lot. Would be nice to know about this.
This is actually a question about last week's exercise. For the forking exercise, it said to wait for someone to accept my merge before starting the second part. However, my merge request is yet to be approved.
I'm very happy to see someone else use cooking anologies! I always do this when running python undergrad courses (coding language = robot chef, data = ingredients, code/script = instructions)
If you use code snippets posted online by others, say in stack overflow etc, do you need to cite them? Always feel slightly guilty just using them, but then again my coding journey just began and nothing was published yet (not even close).
Would like to hear a bit about where things "live" when using Conda. Like what happens under the hood, how do channels work?
conda
.Probably you're going to treat this, but what is the difference between Conda
and Anaconda
?
conda
is a packaging program. You use it for installing packages and creating program environments. Anaconda is the name of a set of packages. You also have miniconda which is a name of a set of packages. Anaconda is huge as it is a lot of packages. Miniconda is the bare minimum of packages needed for having a python environment.
In R
I am using the renv
package (https://rstudio.github.io/renv/articles/renv.html) to deal with package versions. Is this a good approach?
If one wants to publish your work when coding in python, PyPI is a common choice. How do one accomplish this with conda? Is it possible, or is conda-forge and other channels the only option?
Is Spyder another package that can be used?
Is there a good way for me not to have to remember which environmnt I made for which project. Something like conda activate .
?
conda create --prefix
.What is .yml
and why are enviroments written down in this format?
R. said "I prefer having isolated environments instead of having it installed on my computer." I dont get the exact difference. Arent the things installed even if you use an environment, but the isolated environment just helps "list" exactly what you need and where they are?
It has been commented that it is recomended to have one environment per project, but when doing this it seemed to me that it takes a lot of space on my hard disc (few Gb space). Is this a know effect, or I didn´t perceive it properly?
conda clean --all
will remove unused packages and other data.
I am not sure I understood what is an environment. Would you mind explaining again?
Does anyone have experience with JULIA? i have been considering using it as it is supposedly excellent for analysis on large quantities of data
When is the best time to generate this yml
file? At the end of the project? or regularly at each major releases?
environment.yml
. If you work in a python virtual environment, the default is requirements.txt
.
conda
and pip
are both package managers for Python. They can both be used to create environments in order to isolate your code from the rest of the system and keep track of dependencies.
What is the best approach to data sharing/version control if you are the only one in the team using it and writing scripts?..
How was the exercise?
Do we need to do the installation of dependencies for the exercise?
conda env export
will create a file similar to the one we see in Dependencies-1 What is a channel
in relation to Conda, envs, etc.?
channels
: which one to use, where to get info about available ones?
I am not sure how to start the exercise. What exactly do I need to do?
When in a project is it appropriate to create an enviroment.yml
? at the start? at version 1? and when should it be updated?
How and when does one decide to create a new environment? It becomes a bit tricky to me that people work on different projects and sometimes even hard to have a well-defined "project". What would be a good advice for that?
In the "Dependencies-1" example, should we not specify the python version (e.g. 3.5 or 3.7)?
environment.yml
(however, requirements.txt
has no mechanism for that as far as I know)what is the "CodeRefinery conda environment"?
G:\>conda activate
'conda' is not recognized as an internal or external command, operable program or batch file.
conda activate coderefinery
(coderefinery) $
Would the conda environment be operating system agnostic?
environment.yml
can be made with "build numbers", the =hcf16a7b_0_cpython
you see in a question below. These identify specific builds, and aren't portable across different OSs. So conda env export --no-builds
excludes it and makes it a bit more portable.When I try the exercise 2 the first command, I recieved this message:
out-file : Access to the path 'C:\environment.yml' is denied.
At line:1 char:1
+ conda env export > environment.yml
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OpenError: (:) [Out-File], UnauthorizedAccessException
+ FullyQualifiedErrorId : FileOpenFailure,Microsoft.PowerShell.Commands.OutFileCommand
I already have the coderefinery in conda environment
I checked the files from last week. At the end of the directory name there is info like (older code), (master), (just-before). I guess it is git that writes this, but it was very unexpected. What kind of help is that and how do I use it?
What does the text after the second =-sign mean: python=3.11.0=hcf16a7b_0_cpython
h
) and build. More specific exact identifier than version. Note this changes on win/mac/linux, so makes it more reproducible on one computer but not portable to others
I am receiving a conda: command not found error
. I tried adding the conda path to PATH but nothing changed. I even initialized conda.exe in the shell and restarted.
export PATH=$PATH:...
, that's only for the current terminal. If you restart the terminal, you need to do it again.
apt-get update; apt-get install wget
sh ./Miniconda3-py39_4.12.0-Linux-x86_64.sh
should install MiniCondaHow does Conda know where to get info to perform conda activate <env_name>
? Is info about these environments stored somewhere?
conda env export > environment.yml
. But where does coderefinery
env come from, which is activated magically by conda activate coderefinery
?
environment.yml
file, for instance name: coderefinery
.So I can see I have the coderefinery environment under my Miniconda/envs folder and I see it when I do conda info -e
. I have put my current environment info a myownenvironment.yml
file. But what has to happen to
myownenvironment.yml` so it behaves like the coderefinery environment I can see in my envs folder and so it lists like an available environment I can load?
It is difficult to understand from which terminal a person needs to run from. There is the cmd``, the anaconda
one, the git bash
etc... Should I start with the anaconda terminal?
$conda activate bash: conda: command not found.
As you can see with git bash it is not working.The "Setting path to Conda from your terminal shell" is not working. I do the command "echo ". '${PWD}'/conda.sh" >> ~/.bashrc" and when I try conda --version it does not find anything. I remember there was a problem also during the install sessions. Please let me know what to do. with windows
echo $PATH
and paste the results?
$ echo $PATH
/z//bin:/mingw64/bin:/usr/local/bin:/usr/bin:/bin:/mingw64/bin:/usr/bin:/z/bin:/c/Program Files/Python311/Scripts:/c/Program Files/Python311:/c/WINDOWS/system32:/c/WINDOWS:/c/WINDOWS/System32/Wbem:/c/WINDOWS/System32/WindowsPowerShell/v1.0:/c/WINDOWS/System32/OpenSSH:/c/Program Files/dotnet:/cmd:/c/Program Files/MATLAB/R2022b/runtime/win64:/c/Program Files/MATLAB/R2022b/bin:/c/Users/user/AppData/Local/Microsoft/WindowsApps:/usr/bin/vendor_perl:/usr/bin/core_perl
tail ~/.bashrc
)? tail ~/.bashrc
. '/z/'/conda.sh
. '/c/Hyapp/Anaconda3-2022.05/etc/profile.d'/conda.sh
. '/c/LocalData/bortolus/coderefinery'/conda.sh
. '/c/Hyapp/Anaconda3-2022.05/etc/profile.d'/conda.sh
. '/c/Hyapp/Anaconda3-2022.05/etc/profile.d'/conda.sh
. '/c/Hyapp/Anaconda3-2022.05/etc/profile.d'/conda.sh
. '/c/Hyapp/Anaconda3-2022.05/etc/profile.d'/conda.sh
. '/c/Hyapp/Anaconda3-2022.05/etc'/conda.sh
pwd
. That should print the path to you home folder, where the .bashrc file is.editor.exe .bashrc
). Or you can open the folder using explorer .
and go from there.. '/c/u/Anaconda3-2022.05/etc/profile.d'/conda.sh
When uploading my .yml
file to a public repository, would it be better to not include the hash and build for portability to other operating systems? (+1)
If I run the command pip freeze
while being in the miniconda prompt. I see text that looks like certifi @ file:///C:/b/abs_85o_6fm0se/croot/certifi_1671487778835/work/certifi
. What is the meaning of the @file and other things, while it does not specify the versions.
In the anaconda and miniconda I am at C:>, but I do not know how to move to a directory as someone said before, can you clarify? I use anaconda in windows.
cd Users
, and then ls
(or dir
) to see the subfolders there. There should be one which is you username."(coderefinery) C:>conda env export > environment.yml Access is denied." What this means? What should i do?
C:\
cd
.Is there some way you can see the previous version of the environments excercise? i liked it better before. :/ I remember there was one awesome excercise where you could try out a bunch of different commands with conda where you kind of put stuff into a yml file, and then creating a new environment from that but then changing it slightly and so forth.
How do I get the current environment? According to this: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#determining-your-current-environment it should show an asterix, but if I run conda env list
in Git Bash i don't get an asterix.
conda info
it tells you various details on the current activated environments (or if nothing is activated)I have made the script.sh and put in the repository. I run it using bash script.sh. But there is no output. Shouldnt i see a plot? Should i putpt in data folder?
Is Snakemake similar to C makefiles?
I'm getting an error with conda: "CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'. To initialize your shell, run $ conda init <SHELL_NAME>"
conda init
in zsh it might work if you restart the shell. Or go back to zsh, if you know it it will work (you can still bash script.sh
from zsh)
conda init bash
outputs No action taken
, so i will work through zsh instead. Thanks for the help!When I try running bash script.sh I get "No such file or directory" error. I have cloned the repository and cd to word-count folder
script.sh
from the webpage. Copy and paste into script.sh
using an editor.#!/usr/bin/env bash
. This should be copied to script.sh
I did the first exercise easily, but the snakemake exercise doesn't work - or probably i am doing something wrong. I tried to run the lines of code written in step 3. while using git bash inside the "word-count" repository on my pc. It says: "bash: snakemake: command not found".
snakemake
command.
source [PATH TO CONDA]/bin/activate
.
source [PATH_TO_CONDA]/bin/activate
. Then try snakemake --version
just to check that it finds it.
I saved the script bash file, but when trying to run i get "script.sh: line 5: statistics/abyss.data: No such file or directory", followed by other "No such file or directory errors" regarding paths ending with plot.py
ls
or dir
there should be a data
and statistics
directory there.To create the script.sh file we need to do it from the terminal on Binder, correct? If so, I couldn't use nano, vim or vi there. So how to?
I created and ran the script.sh. Now i do git status and i only see the script.sh that is untracked. Why are the plots that i created with the script not there as untracked? They have not been changed?
Am I correct in that Snakemake is completely language independent, and works as long as you can run every step from the command line?
In my current project, I use a mix of Fortran and Python codes, which for example does not run on notebooks and requires a specific environment to be executed. Is it possible to save an environment in git and let the user install it in a one-line procedure? This would be useful even for myself when running from other machines.
environment.yml
thing we discussed in theory does this. Make a environment.yml
file that has the requirements. In theory a user can conda env create -f environment.yml
. Or similar ideas for other tools.
After trying all the installations process, conda -- version
on git bash gives me command not found
. Not sure what to do now
Snakefile and it's working looks cool but I am wondering how to write one. It looks complicated (I mean what are these? shell commands?)
Im a bit lost, I tried to clone the repository and after that I do the snakemake --delete-all-output -j 1
, but error comes like this: Error: no Snakefile found, tried Snakefile, snakefile, workflow/Snakefile, workflow/snakefile. (coderefinery). It can be that I just need to follow today and see these exercises later, but if there is some help that I could catch up with now, would be great.
cd
into the cloned directory? is Snakefile in the same directory where you run the command?
It is still solving the environment for coderefinery env, haven't been able to do anything more than that.
Can I open the generated images in the terminal?
why snake file created empty txt files in plot folder with name of book titles?
How do we create a good "README" for a snakemake file?
IG_question! I do git status
--> snakemake --delete-all-output
--> git status
--> snakemake -j 1
--> git status
. Why do the 1st & the last calls of git status
tell there are no changes in the repo? Files were at least updated (actually, removed & created again, and git
noticed this). git
doesn't track time evolution of a file in repo if there are no changes in the file size / content?
.gitignore
file shows *.log
ignored.Question 56 is still unsolved.
How to edit the files inside Binder? nano
, vim
or vi
doesn't work. Can you give the names of the tools that we can use to edit the *.py
files inside Binder?
from anaconda terminal I activated the coderefinery environment, then I tried to "conda env export > environment.yml" and it tells me that it cannot be done. so i started trying with git bash and i have tried to make it work in bash and it still tells me command not found!
conda activate coderefinery
If I try to visualize the DAG I get an error:
$ snakemake -j 1 --dag | dot -Tpng > dag.png
bash: dot: command not found
Building DAG of jobs...
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='cp1252'>
OSError: [Errno 22] Invalid argument
dot -Tpng
. What does the command do? Where did you find it? Anyway, the dot
command is not installed on your computer that is why it is complaining.
conda install graphviz
). It’s not a must to run this step yourself, but you may if you install graphviz
.
graphviz
but it still gives the same error. https://coderefinery.github.io/reproducible-research/environments/
It sounds like the instructor was talking about what is on his screen without sharing? nvm video was stuck on my end somehow
I was wondering if the workflow is using some tool like Binder and or depends on some integration with github, it can be unaccessible in the future, right? Is there some way to use containers, dockers that is possible to rely on in the long run?
repo2docker
which can also be run locally. Binder isn't actually that fancy, it uses standard reproducibility things like environment.yml
that you want to do anyway. So it's even more reproducible if binder disapperad.
What's the difference between a container and an environment? The container is a next-level up?
https://coderefinery.github.io/reproducible-research/sharing/
We are supposed to follow along? with the exercise? or just watch?
is the full history of the git shown on zenodo when you link the GitHub to a zenodo?
Does Zenodo only work with GitHub? Or other repository platforms are also supported?
Can I use git
to conduct control of basically everything between my local machine and a remote one, e.g. a supercomputer where I not only run simulations, but also store data?
--include
, --exclude
git remote add origin URL
. Also, instead of the URL you could use a path to some other local repo to be synced with.
These "Digital Object Identifiers" feel a little bit like the hashes that we've seen previously. The advantage of the DOI is that it also links to where the object is, correct?
What is the difference between Zenodo and GitHub in terms of data sharing? A reference to GitHub could also be shared.
https://coderefinery.github.io/social-coding/
Choose many. Vote by adding an o
character:
A: Easier to find and reproduce (scientific reproducibility)
B: More trustworthy: others can verify correctness and find and report bugs
C: Enables others to build on top of your code (derivative work, provided the license allows it)
D: Others can submit features/improvements
E: Others can help fixing bugs
F: Many tools and apps are free for open source, so no financial cost for this (GitHub, GitLab, Appveyor, Read the Docs)
G: Good for your CV: you can show what you have built
H: Discourages competitors. If others can't build on your work, they will make competing work
I: When publicly shared, usually we time-stamp or set a version, so it is easier to refer to a specific version
J: You can reuse your own code later after change of job or affiliation
K: It encourages me to code properly from the start
L: To show how good I am
M: The journal insists! I had this experience with Bioinformatics journal, they required I had my C/C++ code to be publicly available on, e.g. GitHub.
**Choose
A: It will be scooped (stolen) by someone else
B: It will expose my "ugly code"
C: Others may find bugs and mistakes. What if the algorithm is wrong?
D: I will get too many questions, I do not have time for that
E: Losing control over the direction of the project
F: Low quality copies will appear
G: I won't be able to sell this later. Someone else will make money from it
H: It is too early, I am just prototyping, I will write version to distribute later
I: Worried about licensing and legal matters, as they are very complicated
J: Do I share it in the correct way? Maybe I'm sharing it in an ugly manner which will discourage others to look into the content?
Free-form answers:
current generation of senior people didn't get where they are by making good software (+4)
Harder to document contribution and work (+3)
Possible to make big money with software (+1)
Not everybody can understand your code, is not so wide audience. The most important contribution is the results from the code
Code is harder for most people to read than a written text (+4)
Code easier to falsify / find problems with.
Professors think code is good when it works once, spending time on commenting etc is a waste of time (+3)
Many simply just copy and paste and don't acknowledge the code others developed. I guess this is because for scientific papers the authorship recognition is much more understood that of a random code someone finds on a forum. (+4)
Maybe because software citation is still not at par with paper citations for an academic career in many fields (:+2:)
I took parts of codes from others and feel like it is not mine. I just combine it. (+3)
Not formally "peer-reviewed" in the same way and aren't "novel results" so hard to convince admin it should be valued as academic currency (similar with methods papers being less valued even though long-term contribution may be huge) (+1)
Code is a tool, raher then a scientific finding. It has no scientific value by itself. Rather belongs to the Methods section of a paper
people dont like to read code/are not interested in the implementation/assume if it runs it is correct.
What about the ChatGPT concern? Already heard a handful of issues from developers that codes were supposedly private but still "somehow" captured by it. (+2)
Free-form answers:
https://coderefinery.github.io/social-coding/licensing/
Choose many. Vote by adding an o
character:
A. Download some code from a website and add on to it
B. Download some code and use one of the functions in your code
C. Changing code you got from somewhere
D. Extending code you got from somewhere
E. Completely rewriting code you got from somewhere
F. Rewriting code to a different programming language
G. Linking to libraries (static or dynamic), plug-ins, and drivers
H. Clean room design (somebody explains you the code but you have never seen it)
I. You read a paper, understand algorithm, write own code
J. What was the meaning of derivative works?
What about a script that I made that is pieces of code that i found randomly online. Can I license it?
Which entity sets the legal rules for licensing? We must be able this is more or less globally adhered to.
Speaking of license and copyright and AI image generators. Please realise that AI image generators violate the copyrights of artists. Many artists have not agreed with the AIs training upon their work to the point that AI users can emulate the work of the original artist quite accurately. That is a big problem.
Are there licenses that claim that linking to libraries constitute derivative work of the linked code? Is it generally safe to assume that linking to libraries is safe for your code's license?
What does it mean, if I write Copyright 2023? How important is it to include a year. Under which conditions should I update the year/year range?
So If I use the algorithm from a paper and i write a code about it, can I have problems with copyrights? (+1)
What is copyleft?
what is the difference between the proprietary licenses and open source licenses?
How do you deal with projects that say they are released under "the MIT license" but don't include an acutal license text?
when you say "you can do" you mean what a person can do using the code that has that specific licence? but he is not the author?
If I create a code with the MIT licence. Hypotetically speaking, someone uses it and creates a software out of it but then the results of the softwares are wrong for some reason. Basically my initial code might have been a wip or something like that. Can he uses the fact that the code does not work properly against me?
How about using generative AI in stuff?
What would a fully attributed code deposit look like? If the programmer referred to 10 tutorials, the documentation, and 25 answers on Stack Overflow...it doesn't seem feasible to keep track of exactly which snippets came from where. Attribution could be longer than the script. Am I missing something?
So do you think that the repository on git hub should be private when you are working on it and only when it is ready for the public, make the repository public?
https://coderefinery.github.io/social-coding/software-citation/
Today was:
One good thing about today:
One thing to improve for next time:
Any other feedback?
On behalf of the CodeRefinery team: We are really happy to read all these feedback! 🥳