Software should be tested regularly throughout the entire development cycle, from the first written lines of code through production releases, to ensure correct operation. Thorough testing is typically an afterthought; however, it can be essential for ensuring changes in a given part of the code do not negatively affect other parts of the code. This is true for all sizes of projects, from big to small, and for all team sizes from one to hundreds.
With all of these test types, you should understand the concept of “expected behavior” instead of “accurate returns.” With scientific code, we expect functions like 2+2 to return 4, which we can think of as an “accurate return” (it is also “expected behavior” in this case), but if you pass in something like “Waffle” + 2, the “expected behavior” that should happen is either that an error is thrown, the 2 is cast to a string and appended, or something you want to have happen. In such a case, the code would still be behaving expectedly, even if it were not returning scientifically accurate information. Unexpected behavior may be “Waffle” + 2 not throwing an error, or calling some routine you did not expect.
Two main types of testing are strongly encouraged:
A third type of testing recommended based on how many dependents your code has, i.e. “How many other places is your code used?”
Code coverage measures how many code paths your tests touch, out of all the possible paths taken through your code. It is often reported in “Percent of lines executed relative to total lines of code,” but it should be read as “what percent of decision branches are executed”, as multiple choices/conditions can lead to the same code being executed. There is no hard and fast rule for what percent you should aim for. 90-95% is a good target, especially early on. 100% is admirable, but often not realistically attainable, especially when working with sprawling code, since touching every line of code with tests can require such esoteric test inputs that the calculations would become unreasonable. However, every code is different, and you should still strive for 100% coverage until you hit the limit of sane programming.
It’s also important to understand the limits of code coverage. Full coverage does not in any way ensure that the code was run correctly, it just states just that all the code was touched by the interpreter. You still have to write tests that assess the “expected behavior” of your code, not just that it ran.
Lastly, there is the concept of “number of tests” as a metric for how well tested a code base is. This metric is meaningless by itself. e.g. a test could be written that simply does “assert x == x”, and then repeats that test for every 232-1 32-bit unsigned integers. Tests should never be written with “how many tests” in mind, they should always be written within the context of three things: the type of the test (list above), whether it captures all reasonable expected behaviors, and how much coverage (code coverage) does the collection of tests have; in that order.
Recommended:
Tutorials: