Project arrangement
Objectives
Project arrangement is the way you organize files into directories and projects.
Project directories: Choose good names and divide up files well
Keep types of files organized within directories
Project arrangement not usually discussed (beyond “make directories”), but setting this up will can be a good start to everything that comes later. A good arrangement makes syncing, backup, and reproducibility much easier.
Why project arrangement matters
Story time: the missing file
One of the authors has some file they would like to find, from around 2007. They probably have it backed up somewhere, but it was in a directory, that got compressed, and then the directory containing that got compressed/backed up again. Maybe one other round?
It probably could be found but it’s not worth the instructor’s time anymore.
If arrangement isn’t good, then you will eventually forget how things are organized.
If you arrange projects well, then moving them around is easier
Keeping similar types of data together makes it easier to manage.
Group work
Naming projects and keeping them organized
Decide what goes together, call it a project
Give a unique name for each project
That name always refers to the same thing, on every computer.
Each directory should be managed the same way - git, rsync, etc. We’ll take about this later.
Every project should have at least a minimal README explaining to you what the project is about.
Examples:
projA
(main project directory)projA-data
(this project has too much data or the data is reused among multiple projects, so it’s split out)projA-paper1
(the articles are tracked separately)
My recommendation is that these directories are as flat as possible, not sub-directories of each other. This makes it easy to manage.
Ways of working together:
Everyone has their own local clone of the code (good for the code directories and other workspaces)
One shared folder that everyone edits (good for data, etc.)
Directories within projects
Keep different types of files organized, probably within directories for example:
source code, original data, parameters, temporary data, final data, etc.
Example:
PROJECT/code/
- backed up and tracked in a version control system.PROJECT/original/
- original and irreplaceable data. Backed up at the same time it is placed here.PROJECT/scratch/
- bulk data, can be regenerated from code+originalPROJECT/doc/
- final outputs, which should be kept for a very long term.PROJECT/paper1/
- different papers/reports, if not stored in a different project directory.PROJECT/paper2/
PROJECT/opendata/
- things you will share
Directories for teams
Usually you get one directory per project allocated on the cluster.
This has to be divided further
Then often subdirectories for each user, for their own workspaces.
Common data folders referred to by each individual’s workspaces.
Example:
SCRATCH/project-2569456905/
- the main directory allocated by the cluster adminsSCRATCH/project-2569456905/user1/
SCRATCH/project-2569456905/user1/code-projA/
- tracked with gitSCRATCH/project-2569456905/user1/code-projB/
SCRATCH/project-2569456905/user2/
SCRATCH/project-2569456905/user2/code-projA/
SCRATCH/project-2569456905/user3/
SCRATCH/project-2569456905/user4/
SCRATCH/project-2569456905/projA-common-data/
- manually managed / git-annex
Code arrangement
An installable software package can be reused easily in different projects, via the project requirements.
This can be a good alternative to keeping mature code/algorithms reusable and not copied from project to project.
For development, you can (python)
pip install -e ../path/to/project/
to make an “editable” versionPython for SciComp contains an example of Python packaging.
Group work
See also
Summary
Don’t be too fancy in your project arrangements.
Plan for a future where you don’t remember how it was arranged.
Keep different types of data together (either in project directories, or sub-directories).