Terminology, IEEE 610.12-1990

  • Fault — often referred to as a Bug
    • A static defect in software (incorrect lines of code)
  • Error
    • An incorrect internal state (unobserved)
  • Failure
    • External, incorrect behavior with respect to the expected behavior (observed)
  • These terms are not used consistently in literature

Troubleshooting Steps

  • Identify the fault, and fix it
  • Identify a test case that does not execute the fault
  • Identify a test case that executes the fault, but does not result in an error
  • Identify a test case that results in an error, but not a failure

RIP Model

  • Three conditions must be present for a failure to be observed:
    • Reachability: the location(s) in the program that contain the fault must be reached
    • Infection: after executing the fault, the state of the program must be incorrect
    • Propagation: the infected state must propogate to cause some output of the program to be incorrect

How to deal with Faults, Errors, and Failures

Addressing Faults at Different Stages

  • Avoidance: better design, better programming languages
  • Detection: testing and debugging
  • Tolerance: redundancy and isolation

Testing vs. Debugging

  • Testing: evaluating software by observing execution

  • Debugging: finding a fault given a failure

  • Testing is hard:

    • Only specific inputs will trigger the fault into creating a failure
  • Debugging is hard

    • Given a failure, it is difficult to know the fault