Quick guide to collaborative development of scientific code

As summer students in the Orengo group we will be working together on a set of projects related to the CATH database and collaboration is key to ensuring the success of our projects.

First step towards good collaboration is defining a set of rules that everyone should adhere to. This could facilitate the process of code review and minimize the time spent writing code by making the code reusable.

Write code for people, not just for computers.


The purpose is to create code that is readable and reusable. When writing code, there are some coding style conventions that could make the job of those working with you much easier. Many projects have their own coding style guidelines that should be decided upon before even starting the actual coding. Good practices include:

  • write good comments (but don’t overdo it);
  • be consistent with code style;
  • choose variable names that are easy to interpret.

The main programming language we will work in is Python, which comes with specific rules related to indentation, line length, imports, the use of tabs and space, etc. A useful guide can be found here.

 

Use Version Control

In bioinformatics and other scientific disciplines, version control is used to manage code source files, track data files and other resources. Version control software such as Git, manage changes to a project without overwriting any code. This means that a number of people can work on a project at the same time without having any one’s work deleted. Everyone can work independently and trust git to merge (combine) their work. They also allow you to travel back in time and track bugs back to the moment of their creation which can save a lot of time and effort. There are different version control systems, all of them having the same function; help developers share their work and ensure reproducibility.

For our projects we will be using Git as version control software and GitHub as repository. A nice and simplified guide to git can be found here.

Best practice when using version control:

  • be consistent and use the same version control system used by the entire team;
  • make small, incremental changes;
  • write good commit descriptions;
  • push your code daily to enable collaborators to contribute.

 

Design your project

“What do we want to achieve?” is the first question that should be answered before starting any project. After defining the scope and the limits of what should be done and what should not , you can begin sketching a rough plan of the code.

Split your plan in smaller, simpler steps. This will help you spot mistakes from the initial stages. In addition, as tasks are split up they become more generic and the code can be reused in the future. This technique is known as writing modular code.

A road map can also be useful to quickly and easily communicate your goals to all the members of the team or new comers.

Test your code

Unit tests check to make sure that a single unit of code is returning correct results, and check to make sure that the behavior of a program doesn’t change when the details are modified. Therefore, it can examine if the research results you are producing using your code are valid and verifiable. Unit tests can help others understand your code and makes it easier to modularize it.  As research code can get very complex, finding errors as soon as possible is crucial and unit testing can help pinpoint these errors much easily.

More information on testing in python here.

Inspired by:

https://mozillascience.github.io/codeReview/design.html

https://github.com/UofABioinformaticsHub/BestPractices

Miruna Serian

Summer student

ISMB, UCL

 

Leave a Reply