Questions and notes from workshop day 6

Feedback, day 6

Today was:

One thing good about today:

One thing to be improved for next time:

Any other comments:

Icebreakers, day 6

Which tools are you likely to keep using?:

(*) = not yet taught, teaching today

What tools do you use for testing code?

Tell us some about this course:

How did you attend (alone home/work, small group home/work/online, organized group, watching videos later...):

How do you celebrate after the course?


  1. Hi, a bit early perhaps. But how do I email my homework for the certificate to this email?: scip _at_ aalto.fi

    • you mean what should be included in that email?
    • I mean its not an email adress, so how can I send an email to it?
    • Replace _at_ with @
    • hahah ofcourse, thanks! (stupid of me)
      • There are no stupid questions! There are probably others wondering but did not ask yet :smile:
    • :smile:
  2. The screen coul be a little more narrow, a bit of text is cut on the edges. Thanks!

    • Is it better?
    • Yes, better, thanks!
  3. Is it possible to test for error handling ?? What about outputing specifc error/warning messages...

    • Many testing frameworks can have tests that expect a specific error or an incorrect output.
    • Or you can test that the output is an error code, for example in C.
    • Testing output is quite a bit more complicated. You would need to capture the output from the system. I would make either capture the error before it's printed (if possible), or have the function also return an error/warning code.
  4. The example of 'automated-testing' is a python function which starts with test_ , there are also other possibilities through unit-tests libraries etc. I tried to implement this but it seemed like to much work. How to balance this test-complexity vs functionality?

    • Do you mean writing a unit test for every function is too much work?
    • You can start by testing the most important low level functions and testing larger functions that call many smaller ones to have a test for all of them.
      • So like a hierarchy of tests? sub-level and sub-sub-level and level test functions?
    • In general having some tests is better than none
  5. How do I know if I have tested which lines of code, especially with tricky workflows with a lot of if else...

    • There are tools for this, but it depends on programming language and testing framework. Search for "test coverage" + you unit testing tool
    • For pytest, there is a tool called "coverage"

Testing

https://coderefinery.github.io/testing/motivation/

Is not testing ok?

Discussion: What’s easy and hard to test?

  1. What is the difference between regression tests and "end to end tests"?
    • Yes the concept is somewhat similar. And in the end any "test" is a regression test, since if a code change breaks the test you notice.
    • Are all tests "regression tests" then?
    • For me it is the same thing in practice.
    • Essentially: Regression tests are tests that keep your functionality intact based on previous results, while End-To-End can also be tests with previously known information
    • The focus might be on having a correct result or the focus might be on not introducing changes in the results. In practice we want both but sometimes we only have the preserving part.

:::info

Exercise until xx:40

https://coderefinery.github.io/testing/pytest/#exercise-pytest

Goals:

  1. One of the keypoints is: "pytest collects and runs all test functions starting with test_. If run on a directory, it collects all files matching test_*.py" My question: would you often write test functions within the same file/directory/package or would you make separate file with all the test functions?

    • good point, you can run pytest on a directory, a package, or a file. Personally I like to keep test functions close to the code in the same file because it helps me "understand" them. I think most people and most projects I see prefer to have test functions in a separate file but then I need to jump between files if I want to change things. I see tests as documentation, not only as safeguard.
    • Depends a lot on whether this is a small script or a package. For a package I would separate them, as it's then a lot easier to ship a smaller bundle. For some small scripts putting them in the same file can be easier.
    • For me: very small tests might go in main module. Once it gets big enough I usually separate them out.
  2. Say I have multiple files, e.g. main and utils1 and utils2. In main I import utils1, utils2. I run somethings from utils1, but I don't use anything from utils2. Does pytest in that case still run the test_.. functions from both utils1 and utils2?

    • In the "auto-detect" mode, it would only find tests from files named test_*.py - so would do neither
    • If you do pytest utils1.py it would do utils1 - or whatever is exactly on command line
    • But this is configurable. I'd say test and see what happens, I need to refresh myself every time anyway.
  3. About Java, what are the testing tools available?

    • I've heard of jUnit, for example, but a web search will probably get you the best answers. There are probably better ones.
    • good! thank you!

Automated testing

https://coderefinery.github.io/testing/continuous-integration/ (Demo, you can try this yourself later!)

  1. Can I also do this automated testing using a similar script when committing locally?

    • Via git hooks, you can tell it to run commands before committing. That would be how to do it.
    • General Git hooks (pre-commit is probably the one): https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks
    • blog post: https://medium.com/@yagizcemberci/automate-unit-tests-with-git-hooks-e25e8b564c92
  2. What are some testing tools for R?

    • testthat. we have an example on https://coderefinery.github.io/testing/continuous-integration/ but I admit that it is a bit complex because R often assumes that you first implement your code as package before you add testing. if you have a simpler example, it would be great to see.
    • ah ok, thank you, good to know!
  3. I am maintaining a small package (an integrated extension) to a much larger software package, both of them are availlable on github. Can I create a workflow where I automatically run tests with my own small package when a new version of the larger package is pushed to github/master?

    • One option is to run the workflow manually from GitHub. If you set up the test to install the latest version of the larger software.
    • I'm not sure how one repo can run when another is updated (it would be a nice idea though!). You can schedule a run so that, for example, it runs every week.
      • this is how we do it right now, but it is quite large/expensive too run, so I am searching for a better solution.
    • How about running a script locally to check for updates. So not on GitHub, but on your own system.
      • If there is a new version, it could push a small change to GitHub (version number, for example), to trigger the test there
        • smart, thanks
  4. Can one do automated testing and install a conda environment from .yml file in the workflow? In the example case we only needed to install pytest in python 3.10

    • With Github Actions, yes, you can - you can do almost anything, it gives you a whole virtual system to work with. It might take a while to resolve and install...
      • the basic python application action already would take a requirements.yml file and install those packages. but there are conda actions as well.
  5. If one wants to do automated testing AND sphinx documentation, would they both go in the same file?

    • I'd usually have two separate files that can run on different triggers. (though documentation testing (does it build without errors?) can be part of normal testing, too!)

Test design

https://coderefinery.github.io/testing/test-design/

:::info

Exercise until xx:45

How's it going:

  1. I have seen terms called "snapshot" and "mock" testing. What is the difference between them ?

    • snapshot is essentially a type of end-to-end testing, where you run the code, take the result and use it to verify that any changes didn't alter it. Mocks (at least in the javascript context) are situations, where you want to test a smaller interactive portion of a larger setup by "simulating" (mocking) the surrounding environment, and just testng, whether the correct functions are called with the expected arguments (e.g. you have one function A that updates an Object b by calling that objects update(data) function. You would "mock" the B.update() function and just check, that the value given to it is what you expect.
    • there could be a mock for a database that the test depends on, right? is that "mocking" in testing?
      • yes that could be an example of mock testing.
  2. I have a question unrelated to testing, but I was hoping I could ask about it anyway. I have a project in python where I am making a map and the file becomes to large to push to github. Is there a way to work around this? I guess the best way is to make the code (or the map file yes) smaller

    • Can the map be generated automatically, instead of including it in the repository?
      • Maybe..
    • It sounds like that the map is more like "data" than "code". If you need to keep these data under version control, there are better tools like git-annex which can integrate with external data storage systems (e.g. google drive) but not with github due to size limitations. If the map does not change much and you don't need to track its changes, then just treat it as a separate data file outside the repository.
    • the github repo could include a small example map and for the real run it could fetch it from places like Zenodo. is the map "static" or does it also change very frequently?
      • this sounds like a good solution, and I would actually like to make it dynamic, but it is not at the moment. In in way I would also like to add it to a web-page to be able to share it, but maybe I can do that using git-annex also?
  3. I have very limited coding knowledge so was quite tricky, but was interesting to go through the solutions and understand it step-by-step.

    • thanks! I know it can be a bit too much
  4. I saw in the documentation of pytest that you could specify fixtures with a decorator. When would you recommend having a "fixture" instead of making the file dependency itself?

    • Fixtures are things that can run some code to set up something for a test: for example, make a temporary directory with input files used by different tests. I'd say they are different things? - you could use fixture for file dependency, but is most useful if there is other setup to do.
  5. Is there a way to test multiple test files ? The tutorial only has one test file.

    • yes! Once you get to that point you'd make test_xxx.py, test_yyy.py, etc. pytest . or pytest tests/ would detect everything from that directory. This is what big projects do.

:::info

Lunch until xx:00 (12:00 CEST, 13:00 EEST)

  1. I happen to have a Java code that outputs some results in txt file and then a Python code that does some machine learning analysis. For the purpose of testing (hence code maintenance etc.), is it a good practise to keep them in different sources of code? Or should I combine them in just one piece of code?

    • If the two codes are clearly connected and you would not use them separately, it makes sense to have them in the same repository.
  2. Sorry if this is obvious, but in the example where you use monkey-patching, shouldn't the 'monkeypatch' be defined first in order to call monkeypatch.setattr() ?

    • thanks! I will look later (now preparing for next lesson)
      • *defined or imported from somewhere? Thanks :)
    • So pytest has some magic behind the scenes. When it runs a test_ function, some arguments are automatically added (fixtures), if it detects something of that name. monkeypatch is one of them. It basically does work as it is, but you won't know what monkeypatch does without the reference: https://docs.pytest.org/en/6.2.x/monkeypatch.html
      • thanks :)

Modular code development

https://coderefinery.github.io/modular-type-along/ :::info

A. What does "modular code development" mean for you?

B. What best practices can you recommend to arrive at well structured, modular code in your favourite programming language?

C. What do you know now about programming that you wish somebody told you earlier?

D. Do you design a new code project on paper before coding? Discuss pros and cons.

E. Do you build your code top-down (starting from the big picture) or bottom-up (starting from components)? Discuss pros and cons.

F. Would you prefer your code to be 2x slower if it was easier to read and understand?

  1. Could you show the structure of temperature.csv? For thinking of ideas for various implementation or additions

    • there, shown. I'll make a copy here:
    • Github can preview csv nicely: https://github.com/coderefinery/modular-type-along/blob/main/data/temperatures.csv
    • Year,m,d,Time,Time zone,Air temperature (degC)
    • 2022,1,1,00:00,UTC,-2.1
    • Thanks!
  2. Suggestions:

    • Plot the temperatures of each hour in a day for each day in a month. Each day may have a different color or a different opacity
    • Write a table that includes the mean temperature value of one year for each year in the dataset
      • good idea. here we recompute known data few times.
      • Yes! This actually requires what we've been doing so far since you need a flexible plot function
    • Add x axis label and y axis label.
    • Write a function to visualize and save the image, with the flag 'save_plot', if True, then save to the uniform filename and path.
      • This does not work, because after savefig must be plt.show() executed.
      • Moreover, it is better to write a vis function as follow: fig = plt.figure(); ax = plt.add_subplots(111); ax.plot(); ax.set_xlabel(); ...; fig.savefig(); fig.show();
    • Store all plots under ./images directory.
      • good idea! we might postpone this this time
    • Good pracitce in jupyter notebook is keep all imports in separated cell, to not import over and over again. (By deafult it's ofcourse store in cache, but for more heavier packages it can be annoyin)
    • read the data before the for loop, no need to read for every temp
    • Specify the type of data which function take and return. "def add(a: int, b: int) -> int:"
      • This can be very powerful when using linting extensions in IDEs that already show errors in types before run time ("static type checking")
    • maybe you could make your read_data function sensitive for different headers when reading from pandas. You can add something like def read_data(nrows, file_name='temperatures.csv', header='Air temperature (degC)'). To default to the temperature measurement, but make it more adaptable for later use.
    • Better practice is indeed to pass mean as an argument. Otherwise, you would let plot_temperatures depend on compute_statistics, introducing coupling that is (in my opinion) not necessary.
    • should read_data_from_File still return temperatures?
    • When working with scripts, good practice is to use if __name__ == "__main__": directive. All functions should be defiend before, and also function main() should be defined. under directive, only main() execution and arguments parses should be performed.
      • thanks! we might but it might also be confusing for those who don't write Python but for Python scripts this is good practice
    • Create test functions
      • that checks whether the mean is calculated correctly (specify a array and check the known mean value?)
        • yes we will do, thanks
    • Add the "requirements.txt" file to improve reproducibility? +1
      • thanks! yes. in a moment
    • I think that it is included in the pandas code already, but you might want to add a custom try and except for the existense of the file and or header. (probably redundant in this case at it is caputred in Pandas already, but this might be good practice for other code parts)
    • This example is too small, but when to split up the functions and the execution part of the code into different files?
  3. JupyterNotebook tips:

    • 'Escape + a' create cell above
    • 'Escape + b' create cell below
  4. Could you also make a function that takes num_measurements as input?

    • thanks! good idea. working on it.
  5. could it be savefig() after show() ? Not sure...

    • +1 show may not be needed
  6. yes, .clf() probably clears to prepare for the next plot.

    • thanks
  7. Is it possible to add all files ending in ".png" to gitignore?

    • thanks
      • was this now done for all "remaining files" or in general based on the ending of the file? e.g. if I would run it again with a different num input I'd like ti to automatically be ignored
        • I don't remember but ideally the file should contain *.png to ignore all PNG files which in this case we consider generated

:::info

Break until xx:05

Suggestions from after the break - focusing on making it reusable from outside of this one script: as a module, command line, etc:

  1. I didn't know there was a statitics standard lib!

    • I also either forgot or never knew.
  2. Comments on Python environments and vscode? Like how to specify run environment / how it detects the interperter?

  3. In Python: if __name__ == "__main__": - why?

    • If you import plot_temperatures.py, it'll define the functions in it but also run the script that's in there - that's probably not wanted when you import something!
    • __name__ is name of current module, and it is "__main__" when you run python that_scrit.py. So you can detect if it's import:ed or run as a script
    • So usually you'd do something like:
    • if __name__ == "__main__": main()
    • and define the script function in main(). Now you can both import this and run it as a script
    • THANKS!
  4. How does click compare to argparse?

    • click is sort of nicer since you can define things straight on functions like this. It's less work and easier to read
      • those arguments are only defined for the function right below the click.command() or can be used for other functions as well?
    • But argparse is built-in, so doesn't require any extra dependency management
    • So take your pick.
    • both are basically equivalent. if you want to compose libraries that have CLI into something bigger, then click is nice
  5. Could you share the code that you wrote now?

    • We'll probably post it after the workshop - stay tuned.
    • (next time we should push to Github so that you all can see each time we commit+push)
    • Can see a model from the instructor guide (we didn't exactly follow this): https://coderefinery.github.io/modular-type-along/instructor-guide/
      • Great thanks!
  6. How often do you use git from the command line and from VSCode?

    • most of the time from vscode these days
  7. Great session after lunch! Really interactive and helpful to see how to approach writing software.

    • thanks!

Example result

https://github.com/rantahar/plot_temperatures_example

Outro

https://github.com/coderefinery/workshop-outro/blob/master/README.md

Feedback, day 6

Today was:

One thing good about today:

One thing to be improved for next time:

Any other comments:

Q&A

:::info

Thank you to all!

:::


Funding

CodeRefinery is a project within the Nordic e-Infrastructure Collaboration (NeIC). NeIC is an organisational unit under NordForsk.

Privacy

Privacy policy

Follow us

Contact

support@coderefinery.org

Improve this page

Source code