ECE 2400 Testing and Debugging Strategy
==========================================================================

This document discuses the testing and debugging strategy we will be
using the programming assignments for ECE 2400.

Testing Process
--------------------------------------------------------------------------

Testing is the process of checking whether a program behaves correctly.
Testing a large program can be hard because bugs may appear anywhere in
the program, and multiple bugs may interact. Good practice is to test
small parts of the program individually, before testing the entire
program, which can more readily support finding and fixing bugs. Unit
testing is the process of individually testing a small part or unit of a
program, typically a function. A unit test is typically conducted by
creating a testbench, a.k.a. test harness, which is a separate program
whose sole purpose is to check that a function returns correct output
values for a variety of input values. Each unique set of input values is
known as a test vector. Manually examining a program's printed output is
cumbersome and error prone. A better test harness would only print a
message for incorrect output. We provide you a basic unit testing
framework in the ece2400-stdlib.h/.c that organizes your tests into test
programs, test cases, and checks. (this paragraph excerpt from texbook)

We will be using a mix of black-box and white-box testing. Black box
testing is where your test cases only test the _interface_ of your
functions. Black-box testing does not _directly_ test any of the
internals within your functions. Obviously, black-box testing will
_indirectly_ test the internals though. White-box testing is where your
test cases directly test the internals. So for example, if your function
calls a helper function, then the helper function is part of the
implementation, not the interface. Directly testing that helper function
would be an example of white-box testing. White-box tests can only be
used with a specific implementation. We also might do what I call
"gray-box" testing. This is where you choose specific test vectors that
are carefully designed to trigger complex behavior in a specific
implementation. Since gray-box tests can be applied to any
implementation, they are like black-box tests. Since they attempt to
trigger complex implementation-specific behavior, they are like white-box
tests.

We will primarly be using directed testing and random testing. Directed
testing is where the programmer explicitly specifies the inputs and the
correct outputs. Directed tests are carefully crafted to enable good
coverage of many different program behaviors. Random testing is where the
programmer randomly generates inputs and then verifies that the function
produces the right output. This of course begs the question, "How do we
know what the right output is, if we are randomly generating the input?"
There are two approaches. First, the programmer can assert that property
is valid on the output. For example, if the function is meant to sort an
input array, the random test can assert that the final array is indeed
sorted. Second, the programmer can use a golden reference implementation.
For example, the programmer might use a function from the standard
library to verify that their own function produces the same outputs.

After our implementations pass all unit tests, we can evaluate how
effective our test suite is by measuring its code coverage. The code
coverage will tell us how much of our source code the test suite executed
during the unit testing. The higher the code coverage is, the less likely
some bugs have not been detected. Code coverage is just one more piece of
evidence you can use to make a compelling case for the correct
functionality of your implementations. It is not required to achieve 100%
code coverage. It is far more important to use code coverage as a way to
guide their test-driven design than to overly focus on the specific code
coverage number.

Note that ad-hoc testing is _not_ an important part of your testing
strategy. It is neither automatic nor systematic.

Debugging Process
--------------------------------------------------------------------------

Step 1: Use `make check` to run all of the tests to get a high-level view
of what test cases are passing and what test cases are failing.

Step 2: Pick one failing test program to focus on. Pick the most basic
test program that is failing. Run just that test program using the Linux
command line like this:

 % ./example-test

Step 3: Pick one failing test case to focus on. Run just that test case
by specifying the test case number on the Linux command line like this:

 % ./example-test 1

Step 4: Look at the error message. Determine what is the observable
error. The error might be a failed ECE2400_CHECK or a segfault.

Step 5: Look at the actual test case source code. Make absolute sure you
know what the test case is testing and that the test case is valid. You
have no hope of debugging your code if you do not understand what correct
execution you expect to happen!

Step 6: Your goal in this step is to narrow the focus of your bug hunt.
You want to look at each C/C++ statement in the test case. You will be
checking for one of three things:

 - (1) are the statement's inputs incorrect and the outputs incorrect? if
       so, then the bug is earlier in the code and you need to continue
       working backwards to an earlier statement;

 - (2) are the statement's inputs correct and the outputs incorrect? if
       so then you have narrowed the bug to be at this statement; if the
       statement is a function call then the bug is in the called
       function; or

 - (3) are the statement's inputs correct and the outputs correct for
       this statement? if so, then the bug is later in the code and you
       need to continue working forwards to a later statement.

For conditional statements, the "inputs" are the variables used in the
condtional statement and the "output" is which direction the "execution
arrow" goes (i.e., did the execution arrow go to the then or else
block?). For iteration statements, the "inputs" are the variables used in
the loop and the "output" is whether or not the loop body is executed or
the loop exits. You can use printf debugging, gdb single-stepping, or gdb
back traces to help examine the inputs and outputs of the statements in
your test case.

Step 6a: For printf debugging, insert printfs to determine the path
through the control flow and the values of certain variables. Start with
a printf right at the beginning of the test case. This should display
correctly. You can gradually continue adding printf statements working
_forwards_ until you find a statement which has correct inputs and
incorrect outputs. At the same time you can also add printf statements
starting from the observable error working _backwards_ until you find a
statement which has correct inputs and incorrect outputs. So you might
add a printf right before the check which is failing to display the
values of the variables being used in the assertion and then work
backwards from there.

Step 6b: For gdb debugging, use 'gdb -tui' to drop into the test case you
are focusing on. Single step forward through the code. Observe the
control flow and use 'print' commands to display the values of variables.
You are moving _forwards_ until you find a statement with correct inputs
and incorrect outputs. You can also move backwards using reverse
stepping.

Step 6c: For segfaults, you can also use either Step 6a or Step 6b to
determine where the segfault is happening. An alternative is to use gdb
to give you a stack trace (i.e., the sequence of nested function calls)
that led to the crash. This is also called a back trace. Use gdb and then
use the 'run' command to run your test until it crashes. Then use the
'backtrace' command to get the sequence of function calls including line
numbers that lead to the crash. Then use either Step 6a or 6b to narrow
the focus of your bug hunt.

Step 7: Once you find a statement which has correct inputs but incorrect
outputs, make a hypothesis about what should happen if you fix the bug.
Your hypothesis should not just be "fixing the bug will make the test
pass." It should instead be something like "fixing this bug should make
this specific variable be 1 instead of 0" or "fixing this bug should make
this specific if statement execute the else block". Fix the bug and see
what happens by looking at the output of your printf statements or by
using gdb. Do not just see if it passes the test -- literally check the
specific statement you identified in Step 6 using printf or gdb. One of
four things will happen:

 - (1) the test will pass and the printf/gdb behavior will match your
       hypothesis -- bug fixed!

 - (2) the test will fail and the printf/gdb behavior will not match your
       hypothesis -- you need to keep working -- your bug fix did not do
       what it was supposed to, and it did not fix the error -- undo the
       bug fix and go back to step 6.

 - (3) the test will fail but the printf/gdb behavior _will_ match your
       hypothesis -- this means your bug fix did what you expected but
       there might be another bug still causing trouble -- you need to
       keep working -- go back to step 6.

 - (4) the test will pass and the printf/gdb behavior _will not_ match
       your hypothesis -- you need to keep working -- your bug fix did
       not do what you thought it would even though it cause the test to
       pass -- there might be something subtle going on -- go back to
       step 6 to figure out why the bug fix did not do what you thought
       it would.

If you are passing all of your tests but failing the evaluation you need
to craft a minimum test case which causes the same bug. Then you can
start this seven step process. If at all possible you want to avoid
debugging your code using the evaluation. Worst case, if you must debug
your code using the evaluation you really must add a new test case once
you figure out the bug.

Note a couple things about this systematic seven step process. First, it
is a systematic process. It does not involve randomly trying things. It
does not involve randomly commenting out code. Second, the process uses
all tools at your disposable: output from the test program, printfs, gdb
single-stepping, and gdb back traces. You really need to use all of these
tools. Third, the process requires you to think critically and make a
hypothesis about what should change. Do not just change something, pass
the test, and move on. Change something and see if the printf/gdb
behavior change in the way you expect. Otherwise you can actually
introduce more bugs even though you think are fixing things.