EECS 280: Programming and Introductory Data Structures

Section 03 Procedural Abstraction

University of Michigan at Ann Arbor

Last Edit Date: 01/09/2023

Disclaimer and Term of Use:

We do not guarantee the accuracy and completeness of the summary content. Some of the course material may not be included, and some of the content in the summary may not be correct. You should use this file properly and legally. We are not responsible for any results from using this file

This personal note is adapted from Professor Amir Kamil, Andrew DeOrio, James Juett, Sofia Saleem, and Saquib Razak. Please contact us to delete this file if you think your rights have been violated.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstraction¶

Abstraction is the principle of separating what something is or does from how it does it.

For example, in order to make a sandwich, we do not need to know how bread is made, nor peanut butter nor jelly. All we need to know is how to put those ingredents together to construct a sandwich. Thus, we rely on the abstractions of bread, peanut butter, and jelly to make a sandwich.

Usually there are many layers of abstraction in a complex system.

In the sandwich example above:

The upper layer is the consumer of the sandwich (they do not need to know how to make a sandwich).
The next lower layer is the maker of the sandwich (they do no need to know how to make a piece of bread and other things).
The next lower layer is the maker of jelly (they do not need to know how to grow strawberries).
The next lower layer is the farmer.

It is the same with computer. When it comes to computational procedures, functions are our mechanism for defining procedural abstractions. The user of a function need only know about what the function does without caring about how the fucntion accomplishes its task.

Code Organization in C++¶

We usually decompose a big program into independent modules, defined in separate files.

In C++, a single module is further decomposed into a header file, which ends with .h, and a source file, which ends with .cpp.

The header file contains the interface of the module, while the source file contains the actual implementation.

Note: The extensions .hpp and .hxx are other common conventions for header files, and .cc and .cxx are also common for source files.

The following is an example of a program.

The following code is the header file stats.h, which only contains the declaration of the function mean().

1     #include <vector>
2
3     //REQUIRES: v is not empty
4     //EFFECTS: returns the arithmetic mean of the numbers in v
5     double mean(std::vector<double> v);

The following code is the actual definition of function mean() in stats.cpp.

1     #include "stats.h"
2     #include "p1_library.h"
3
4     using namespace std;
5
6     double mean(vector<double> v) {
7       return sum(v) / count(v);
8     }

Other source files that use the functions in the stats module need only include the header file (stats.h) with the #include directive:

1     #include <iostream>
2     #include "stats.h"
3
4     using namespace std;
5
6     int main() {
7       vector<double> data = { 1, 2, 3 };
8       cout << mean(data) << endl;
9     }

A source file that uses the stats module only needs to know about the interface of the module. As long as the interface remains the same, the implementation of a module can change arbitrarily without affecting the behavior of a source file that uses it.

The #include directive actually pulls in the code from the target into the current file. So the end result is as if the declarations for the stats functions actually were written in this source file as well. We get access to those functions’ declarations without having to manually repeat them.

Library: For library modules such as vector and iostream are srounded by <angle brackets>.
Non-library: For non-library headers that are located in the sam directory are surrounded by "double quotes".

Note: Do not #include anything other than header files and standard libraries.

The using namespace std; directive allows us to refer to standard-library entities without prefixing them with std::. An alternative is to avoid the prefix for specific entities with individual using declarations:

 1     #include <iostream>
 2     #include "stats.h"
 3
 4     using std::cout;  // allow cout to be used without std:: 
 5     using std::endl;  // allow endl to be used without std::
 6
 7     int main() {
 8       // must prefix vector with std::
 9       std::vector<double> data = { 1, 2, 3 };
10       // can elide std:: with cout and endl
11       cout << mean(data) << endl;
12     }

Note: It is a bad practice to put using namespace std; or some using declarations in a header file because this requires other people who use this header to use these.

The following picture is an example of how headers are included. Arrows are shown between header files and the source files that #include them.

When compiling the project, only source files are passed to the compiler. The header files are folding into the source files through the #include directive.

$ g++ --std=c++11 -pedantic -g -Wall -Werror p1_library.cpp stats.cpp main.cpp -o main.exe

Note: We can complie many source files at one time, but only one source file need to contain the main() function.

The Compilation Process¶

Consider the compilation command above. The elements of the command are:

g++ is the C++ compiler we are invoking.
The --std=c++11 argument tells it to compile according to the C++11 language standard.
The -pedantic argument tells the compiler to adhere strictly to the C++ standard. Without this flag, compilers often provide extensions or allow behavior that is not permitted by the standard.
The -g argument tells the compiler to produce an executable that facilitates debugging.
The -Wall argument asks the compiler to generate warnings about possible programming errors.
The -Werror argument configures the compiler to treat warnings as errors, so that it does not compile code that has warnings.
The arguments -o main.exe tell the compiler to produce the output file main.exe.
The remaining three arguments are the source files for our program – p1_library.cpp, stats.cpp, and main.cpp

For the source files, the compiler will compile each of them separately, producing temporary object files. It will then attempt to link the object files together into the output executable. The linking step can fail if:

A function is declared and used in the source files, but no definition is found.

Undefined symbols for architecture x86_64:
  "percentile(std::__1::vector<double, std::__1::allocator<double> >, double)", referenced from:
      _main in main-dc223c.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Multiple definitions of the same function are found.

duplicate symbol _main in:
    /var/folders/gc/0lqwygqx381fmx9hhvj0373h0000gp/T/main-9eba7c.o
    /var/folders/gc/0lqwygqx381fmx9hhvj0373h0000gp/T/stats_tests-b74225.o
ld: 1 duplicate symbol for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Upon success, the result of the linking step is the final program executable, main.exe for the compilation command above.

Specification Comments (RMEs)¶

A function's documentation tells people about what it does. Usually the documentation format we use is a RME, which consists of a REQUIRES clause, a MODIFIRS clause, and an EFFECTS clause.

1     //REQUIRES: v is not empty
2     //MODIFIES: nothing
3     //EFFECTS: returns the arithmetic mean of the numbers in v
4     double mean(std::vector<double> v);

REQUIRES¶

The REQUIRES clause lists what the function requires in order to accomplish its task. If the requirements are not met, then the function provides no guarantees – the behavior is undefined, and anything the function does (e.g. crashing your computer, stealing your credit-card info, etc.) is valid. Thus, a user should never call a function with arguments or in a state that violates the REQUIRES clause.

Within the function definition, the implementation is allowed to assume that the REQUIRES clause is met – again, a user should never call the function if they are violated.

A good practice is to #include <cassert>, and assert() the required condition before going into the content of the function.

Note: assert() will return a boolean value, either True or False.

assert() can tell the users whether they did something wrong and where the requirements were violated.

If a function doesn’t have any requirements, the REQUIRES clause may be elided. Such a function is called complete, while one that has requirements is called partial.

MODIFIES¶

The MODIFIES clause specifies the entities outside the function that might be modified by it. This includes pass-by-reference parameters, global variables (not in this course – only global constants are permitted), and input and output streams (e.g. cout, cin, a file stream, etc.).

The MODIFIES clause only specifies what entities may be modified, leaving out any details about what those modifications actually might be. The latter is the job of the EFFECTS clause. Instead, the purpose of the MODIFIES clause is for the user to quickly tell what items might be modified.

If no non-local entities are modified, the MODIFIES clause may be elided.

EFFECTS Clause¶

The EFFECTS clause specifies what the function actually does. This includes details about what modifications are made, if any, as well as what the return value means, if there is one. All functions should have an EFFECTS clause – if a function does nothing, there is no point to writing the function.

The EFFECTS clause should generally only indicate what the function does without getting into implementation details (the how). It is part of the interface of a function, so it should not be affected if the implementation were to change.

Properties of Procedural Abstraction¶

As mentioned previously, the implementation of an abstraction should be able to change without affecting the way the abstraction is used.

Abstractions should also be local, meaning that it should be possible to understand the implementation of one abstraction without knowing anything about the implementation of other abstractions.

Testing¶

The small scope hypothesis states that thorough testing with “small” test cases is sufficient to catch most bugs in a system. Thus, our test cases need not be large – in general, they should be small enough where we can compute the expected results by hand.

Fewer test cases that are meaningfully different is more effective than having many, redundant test cases.

Two main categories of test cases:

Unit test: Test one piece of code at a time, often at the granularity of individual functions or small groups of functions.

This helps to find bugs early as well as make them easier to debug, since a failed unit test identifies exactly which function is broken.
System test: Test an entire module or program as a whole.

This can identify bugs that occur when integrating multiple units together – it’s possible that two units appear to work individually, but one unit makes an incorrect assumption about the other that only manifests when they are tested together.

System tests can also be closer to the real use case of a program, providing a measure of confidence that the program works as a whole.

Three types of valid test cases:

Simple cases are for the “average” case.
Edge cases are those that test special cases in a unit’s behavior.
Stress tests are intensive tests that ensure that a system will work under a heavy load.