ECE 2400 Testing and Debugging Strategy ========================================================================== This document discuses the testing and debugging strategy we will be using the programming assignments for ECE 2400. Testing Process -------------------------------------------------------------------------- Testing is the process of checking whether a program behaves correctly. Testing a large program can be hard because bugs may appear anywhere in the program, and multiple bugs may interact. Good practice is to test small parts of the program individually, before testing the entire program, which can more readily support finding and fixing bugs. Unit testing is the process of individually testing a small part or unit of a program, typically a function. A unit test is typically conducted by creating a testbench, a.k.a. test harness, which is a separate program whose sole purpose is to check that a function returns correct output values for a variety of input values. Each unique set of input values is known as a test vector. Manually examining a program's printed output is cumbersome and error prone. A better test harness would only print a message for incorrect output. We provide you a basic unit testing framework in the ece2400-stdlib.h/.c that organizes your tests into test programs, test cases, and checks. (this paragraph excerpt from texbook) We will be using a mix of black-box and white-box testing. Black box testing is where your test cases only test the _interface_ of your functions. Black-box testing does not _directly_ test any of the internals within your functions. Obviously, black-box testing will _indirectly_ test the internals though. White-box testing is where your test cases directly test the internals. So for example, if your function calls a helper function, then the helper function is part of the implementation, not the interface. Directly testing that helper function would be an example of white-box testing. White-box tests can only be used with a specific implementation. We also might do what I call "gray-box" testing. This is where you choose specific test vectors that are carefully designed to trigger complex behavior in a specific implementation. Since gray-box tests can be applied to any implementation, they are like black-box tests. Since they attempt to trigger complex implementation-specific behavior, they are like white-box tests. We will primarly be using directed testing and random testing. Directed testing is where the programmer explicitly specifies the inputs and the correct outputs. Directed tests are carefully crafted to enable good coverage of many different program behaviors. Random testing is where the programmer randomly generates inputs and then verifies that the function produces the right output. This of course begs the question, "How do we know what the right output is, if we are randomly generating the input?" There are two approaches. First, the programmer can assert that property is valid on the output. For example, if the function is meant to sort an input array, the random test can assert that the final array is indeed sorted. Second, the programmer can use a golden reference implementation. For example, the programmer might use a function from the standard library to verify that their own function produces the same outputs. After our implementations pass all unit tests, we can evaluate how effective our test suite is by measuring its code coverage. The code coverage will tell us how much of our source code the test suite executed during the unit testing. The higher the code coverage is, the less likely some bugs have not been detected. Code coverage is just one more piece of evidence you can use to make a compelling case for the correct functionality of your implementations. It is not required to achieve 100% code coverage. It is far more important to use code coverage as a way to guide their test-driven design than to overly focus on the specific code coverage number. Note that ad-hoc testing is _not_ an important part of your testing strategy. It is neither automatic nor systematic. Debugging Process -------------------------------------------------------------------------- Step 1: Use `make check` to run all of the tests to get a high-level view of what test cases are passing and what test cases are failing. Step 2: Pick one failing test program to focus on. Pick the most basic test program that is failing. Run just that test program using the Linux command line like this: % ./example-test Step 3: Pick one failing test case to focus on. Run just that test case by specifying the test case number on the Linux command line like this: % ./example-test 1 Step 4: Look at the error message. Determine what is the observable error. The error might be a failed ECE2400_CHECK or a segfault. Step 5: Look at the actual test case source code. Make absolute sure you know what the test case is testing and that the test case is valid. You have no hope of debugging your code if you do not understand what correct execution you expect to happen! Step 6: Your goal in this step is to narrow the focus of your bug hunt. You want to look at each C/C++ statement in the test case. You will be checking for one of three things: - (1) are the statement's inputs incorrect and the outputs incorrect? if so, then the bug is earlier in the code and you need to continue working backwards to an earlier statement; - (2) are the statement's inputs correct and the outputs incorrect? if so then you have narrowed the bug to be at this statement; if the statement is a function call then the bug is in the called function; or - (3) are the statement's inputs correct and the outputs correct for this statement? if so, then the bug is later in the code and you need to continue working forwards to a later statement. For conditional statements, the "inputs" are the variables used in the condtional statement and the "output" is which direction the "execution arrow" goes (i.e., did the execution arrow go to the then or else block?). For iteration statements, the "inputs" are the variables used in the loop and the "output" is whether or not the loop body is executed or the loop exits. You can use printf debugging, gdb single-stepping, or gdb back traces to help examine the inputs and outputs of the statements in your test case. Step 6a: For printf debugging, insert printfs to determine the path through the control flow and the values of certain variables. Start with a printf right at the beginning of the test case. This should display correctly. You can gradually continue adding printf statements working _forwards_ until you find a statement which has correct inputs and incorrect outputs. At the same time you can also add printf statements starting from the observable error working _backwards_ until you find a statement which has correct inputs and incorrect outputs. So you might add a printf right before the check which is failing to display the values of the variables being used in the assertion and then work backwards from there. Step 6b: For gdb debugging, use 'gdb -tui' to drop into the test case you are focusing on. Single step forward through the code. Observe the control flow and use 'print' commands to display the values of variables. You are moving _forwards_ until you find a statement with correct inputs and incorrect outputs. You can also move backwards using reverse stepping. Step 6c: For segfaults, you can also use either Step 6a or Step 6b to determine where the segfault is happening. An alternative is to use gdb to give you a stack trace (i.e., the sequence of nested function calls) that led to the crash. This is also called a back trace. Use gdb and then use the 'run' command to run your test until it crashes. Then use the 'backtrace' command to get the sequence of function calls including line numbers that lead to the crash. Then use either Step 6a or 6b to narrow the focus of your bug hunt. Step 7: Once you find a statement which has correct inputs but incorrect outputs, make a hypothesis about what should happen if you fix the bug. Your hypothesis should not just be "fixing the bug will make the test pass." It should instead be something like "fixing this bug should make this specific variable be 1 instead of 0" or "fixing this bug should make this specific if statement execute the else block". Fix the bug and see what happens by looking at the output of your printf statements or by using gdb. Do not just see if it passes the test -- literally check the specific statement you identified in Step 6 using printf or gdb. One of four things will happen: - (1) the test will pass and the printf/gdb behavior will match your hypothesis -- bug fixed! - (2) the test will fail and the printf/gdb behavior will not match your hypothesis -- you need to keep working -- your bug fix did not do what it was supposed to, and it did not fix the error -- undo the bug fix and go back to step 6. - (3) the test will fail but the printf/gdb behavior _will_ match your hypothesis -- this means your bug fix did what you expected but there might be another bug still causing trouble -- you need to keep working -- go back to step 6. - (4) the test will pass and the printf/gdb behavior _will not_ match your hypothesis -- you need to keep working -- your bug fix did not do what you thought it would even though it cause the test to pass -- there might be something subtle going on -- go back to step 6 to figure out why the bug fix did not do what you thought it would. If you are passing all of your tests but failing the evaluation you need to craft a minimum test case which causes the same bug. Then you can start this seven step process. If at all possible you want to avoid debugging your code using the evaluation. Worst case, if you must debug your code using the evaluation you really must add a new test case once you figure out the bug. Note a couple things about this systematic seven step process. First, it is a systematic process. It does not involve randomly trying things. It does not involve randomly commenting out code. Second, the process uses all tools at your disposable: output from the test program, printfs, gdb single-stepping, and gdb back traces. You really need to use all of these tools. Third, the process requires you to think critically and make a hypothesis about what should change. Do not just change something, pass the test, and move on. Change something and see if the printf/gdb behavior change in the way you expect. Otherwise you can actually introduce more bugs even though you think are fixing things.