Best Practices


MolSSI best practices provide a starting point for software development operations to help ensure your project is usable and maintainable. Following best practices and recommendations will increase your project’s long-term viability and the likelihood that others will be able to use and contribute to your project. 

Our best practices center around several pillars established through years of experience developing and contributing to open-source scientific software. However, we recommend careful consideration of each practice in your project. Depending on your software’s scope, you may adopt these to varying degrees.

You can get hands-on experience implementing most of the MolSSI recommended Best Practices in MolSSI’S Best Practices Workshop.

MolSSI’s best practices center around the following pillars:

Bes Practices

  • Version Control

    Version control keeps a complete history of your work on a given project. It facilitates collaboration – everyone can work freely on any part of the project without overriding others’ changes. You can move between past versions and roll back when needed. You can also review your project’s history through commit messages describing each added change and see what exactly has changed in the content. You can see who made the changes and when they happened.

    Version control is a powerful tool and fundamental practice in software development. When coupled through a code hosting service, it easily allows contributions from outside collaborators. Version control benefits both individuals and teams and should be adopted in almost all projects.

    Recommended Software:

    Novice Tutorials:

    Intermediate:

  • Sharing Code

    In a couple of cases, we might want to share the code with others. When working on a software project, instead of working alone, there is a high chance that more than one person is working on the same thing. Sharing code becomes critical when different people are collaborating on the same codebase. For open source software, sharing code also gives the public access to the code for reviewing, testing, and contributing to it.

    At the same time, sharing code along with published papers also enables the ability of others to understand the code, increasing the reproducibility, reusability, and expandability. It also enables others to be able to cite the software and credit its authors when they use it.

    MolSSI recommends using git for version control, and GitHub as a hosting service, though there are other options.

    Tutorials for sharing code with GitHub

  • Testing and Code Coverage

    Software should be tested regularly throughout the development cycle to insure correct operation. Thorough testing is typically an afterthought, but for larger projects it can be essential for ensuring changes in some parts of the code do not negatively affect other parts.

    Two main types of testing are strongly encouraged

    • Regression tests – given a known input, does the software correctly and consistently return the correct values?
    • Unit tests – Similar to general testing, except testing is done on much smaller units (such as single functions or classes). This is helpful for catching errors in uncommonly-used parts of the code which may be skipped in general testing. Unit tests can be added as new features are added, resulting in better code coverage.

    Recommended:

    Tutorials:

  • Continuous Intergration

    Continuous integration (CI) automatically builds your codes,runs tests on a variety of different platforms, and deploys all manner of builds and documentation as desired Typically this may be run when new code is proposed (e.g. through GitHub Pull Requests), or committed to the  repository. CI is useful for catching bugs before they reach your end users, and for testing on platforms that are not available to every developer.

    CI can be broken down into several stages. Most CI should at least build the code and then run unit tests. The build stage takes the source code and does any compilation and dependency resolution/installation for the next stage. Compiled languages like C++ and Rust require this step to turn all the source code into executables. Interpreted languages like Python or R do not usually need this step explicitly to turn source code into compiled code, but still typically need to install dependencies. The unit test stage runs a series of tests to ensure the code is working as expected without syntactical or logical errors. Most, if not all, codes should have these. Some codes where accuracy is needed (especially in the scientific field) should also include a regression test stage where accuracy is compared against computed values. Regression tests can take significantly longer than unit tests and may need to be relegated to very infrequent CI runs, or handled through a separate means. Lastly a deploy stage can take any compiled and verified code and push it to the appropriate branch or service to make it available. Deployment can also include things such as documentation pages, API’s, and experimental/nightly builds.

    GitHub itself now provides a CI service for its repositories called “GitHub Actions” which can be configured to run with most repos. However, there are also many other CI services, most of which have webhooks for integration with GitHub. There are also CI services for non-GitHub based code repositories.

    Examples of CI Software/Services

  • Code Style

    Code that lives beyond its initial development will be read many times more than written as the project is maintained and new features are added. Establishing and following a standard style in your projects will increase readability, make maintenance easier, and can reduce onboarding time for new developers.

    While code style can be personal, languages usually have at least a few dominant coding styles which are familiar to most programmers in that language. When programming in Python, the most commonly followed style is some variation of PEP 8. In Python, you might also consider adopting type hinting for large projects. Documentation embedded in the code through documentation strings or comments is a crucial aspect of code style you should also establish for your projects.

    Automatic formatting tools can enforce a particular coding style and are often configurable for each project.

    Example of a coding style guides:

  • Documentation

    The importance of documentation in an organization is often determined by multiple factors including the adopted software development practices (waterfall, agile etc.) and the size of the software being documented. Regardless, the documentation is a vitrine for the software which reflects its health and mirrors the livelihood of the software ecosystem with regular updates.

    Ideally, the documentation not only offers brief and informative guidelines to help busy users achieve their goals rapidly but also detailed user/developer manuals to provide deeper insights into the software infrastructure. The former type of documentation is often entitled as getting started, quick guide, 10-minutes to … etc. The 10 minutes to Pandas and 10 minutes to Dask are great examples of such documents. The documentation can also be complemented by short video tutorials or brief blog posts with practical examples to further help the users.

    The developer documentation, on the other hand, involves several more detailed components:

    • Build requirements and dependencies
    • How to compile/build/test/install
    • How to use the software
    •  More detailed practical examples

    In addition, the developer guides should also delineate the application programming interface (API) which paves the way for the developer community support and collaboration to further implement and maintain bits and pieces within the software infrastructure. The API Reference often involves the documentation of various internal files, function and class signatures as well as the reasoning behind the naming conventions and certain adopted designs. Mature scientific and engineering libraries such as oneAPI Math Kernel Library (oneMKL) and oneAPI Deep Neural Network (oneDNN) library from Intel provide great examples of this class of documentation.

    The documentation should be kept up to date with changes in the code which is not an easy task, especially for large and fast-moving code bases tied with agile software engineering practices. However, slightly out-of-date documentation is generally preferable to no documentation. It is recommended that the examples provided within the documentation are compiled and tested regularly in order to maintain the quality and usefulness of the documentation over time.

    Popular documentation packages:

    Examples of good documentation:

  • Build Systems

    Generally, at least part of a lot of software must be compiled. Doing this in a clean (and possibly cross-platform) way is not trivial.

    However, having a somewhat standard build system makes uptake by new users and developers much easier and makes it more likely that the code will be maintained in the future. Therefore, use of common build systems is encouraged. For most compiled C/C++/Fortran code encountered in computational chemistry, CMake is recommended.

  • Best Practices in Software Design

    Software quality depends on many factors such as: Functionality, usability, performance, reliability, portability, interoperability, scalability, and reusability (see full description here).

    There are many aspects that contribute to a good design and to the quality of your software. An important one is to follow the best practices and give thoughts to the design of your software. Luckily, many experienced programmers have developed best practices over a substantial period of time. Those best practices can help inexperienced developers to learn software design easily and quickly.

    SOLID:

    The first thing you can learn that will immediately improve the quality of software is to follow the SOLID Principles of Software Design. Following those 5 principles will result in a more understandable, flexible and maintainable code.  You can read more here:

    Design Patterns:

    Design Patterns are well-thought-of, reusable solutions to common recurring problems that developers face during software development. They are considered a common terminology between experienced developers. Design Patterns are general and can be applied to any programming language. The following are some references to get you started.

    Object Oriented Programming (OOP): 

    Object Oriented Programming (OOP) is a method of structuring functions and data into objects that can help organize software projects. It has a number of advantages, including improving reusability and maintainability. It is highly encouraged to use OOP in large-scale projects

  • Containerization

    Containerization is a tool that allows you to launch and work with many different environments (known as “containers”) on a single computer, each of which might be running a different operating system, a different set of installed libraries, different environment variables, etc.  Each container operates in isolation from the others; effectively, you can think of each container as a totally different computer, which just happens to be sharing the same hardware as the other containers.  Although similar in some ways to Virtual Machines, containerization is based on fundamentally different technology, and is generally much easier to set up and has a dramatically smaller performance cost.  For most purposes, the performance cost of using containers is negligible.

    This opens up all sorts of possibilities.  Do you own a Windows machine, but want to test a code on Linux?  No problem – just launch a Linux container on your Windows machine, and test away!  Are you having trouble reproducing a bug someone else has encountered, and suspect the problem might be dependent on some detail of the runtime environment?  There’s no need to mess with (and potentially break) your own environment in pursuit of the bug – just try some tests in a few containers, leaving your own environment unchanged.  In fact, because of the clean isolation and reproducibility of environments that is provided by containerization, anything you can do in a container should probably be done in a container.

    Moreover, you can easily build and deploy containers.  For example, you could build a code you are developing (along with all of its dependencies) within a container, and then deploy the container.  Because the container contains your compiled software and everything needed to run it, including the operating system, the end user doesn’t need to install anything on their system.  All they need to do is launch your container and start running calculations.

    By far the most commonly used containerization tool is Docker, which is what we recommend getting started with.  It is important to note that using Docker effectively requires root access.  HPC centers are never going to give you root access to their expensive machines, which precludes Docker from use in an HPC context.  Fortunately, there are several containerization alternatives that have been developed specifically for use on HPC machines, with the most prominent being Apptainer.  If you are interested in using containerization on an HPC machine, ask the organization that operates the machine about which containerization solution(s) they recommend.

    Recommended Software (not usable with HPC):

    Recommended Hosting Service:

    Alternatives for HPC:

    Container Orchestration: