Libc unit testing project

The libc unit testing project, organized jointly by musl libc and Openwall, aims to apply rigorous testing to implementations of the standard library functions, in order to:

ascertain musl's readiness for deployment and fix
find and fix potentially security-critical bugs in multiple libc implementations

The recent discovery of the longstanding fnmatch/alloca vulnerability in glibc should serve as motivation.

This project is focusing on a testing approach rather than direct code audits. This way, the tests we produce can be applied to multiple implementations, and as regression tests during musl's rapid development and against other non-mainstream, less-widely-used libcs. Nonetheless, the project entails some level of libc source audit to search and identify the components likely to contain errors and which demand the most attention in testing.

Below is an outline of the proposed categories we aim to test.

0. Base definition tests

Test that the interfaces (type definitions, macros, and prototypes) defined in the headers are basically correct. The idea is to catch incorrect or inconsistent definitions when porting to new platforms.

Testing entails making simple calls to at least one function using each interface structure (stat, sigaction, flock, msghdr, …) and sanity-checking the results. A reasonable attempt should also be made to use as many of the bitflag or enum-like macros as possible (things like O_APPEND, etc.), testing their behavior to ensure that they were defined correctly.

I. String operations testing

Test all functions defined in string.h for:

read beyond end of source
write beyond end of dest
returning the wrong value
ending with the wrong results in dest buffer

under all combinations, for both source and dest if applicable, of:

all possible alignments and length-alignments
small buffers (smaller than 1-2 machine words)
normal size buffers (at least several machine words)
large buffers (more than a page)
giant buffers (more than 2gb/4gb, only possible on 64-bit machines) (notes)
low and high byte content

And additionally, for strstr, possibly test needles with various periodicity properties that affect the way the two-way algorithm decomposes the needle.

II. Malloc testing

Test various patterns of memory allocation, aiming for various levels of fragmentation. Perform the tests both in single-threaded and multi-threaded environments.

Checks for correctness:

None of the allocated blocks should overlap, and all should be successfully writable for the requested number of bytes.
Allocations made by posix_memalign should be correctly aligned and freeable by free.
Arguments to calloc which would overflow size_t when multiplied should result in allocation failure, not under-allocation.
Allocating a block so large that subtracting two pointers within that block could overflow ptrdiff_t should not be possible.

Further implementation-specific correctness checks: checking consistency of bookkeeping information before and after each allocated block.

Possible quality-of-implementation checks: Attempting to obtain pathological fragmentation and allocation failure where it should not happen.

III. Numeric parsing tests

Numeric parsing with the following functions should be tested: (str|wcs)to(umax|imax|u?l?l|ld|d|f), sscanf, fscanf, wsscanf, fwscanf.

Working from the specifications for these functions, develop a number of corner case strings likely to be wrongly accepted or wrongly rejected. Especially worth testing are strings which are initial substrings of valid numeric strings, but which are not themselves valid numeric strings, the most basic example of which is “0x”.

For the scanf-family functions, encountering an initial subsequence of a valid numeric string followed by junk should result in a scan failure. For the (str|wcs)to… functions, sometimes an initial subsequence of the initial subsequence will still be a valid number, and it should be processed. Tests should check the end pointer these functions save to confirm that the right number of characters were accepted.

Also test overflow behavior: the value of errno, the “fake” values returned on overflow (ULONG_MAX, etc.) and so on. Some of the specified behavior, especially “negative” values for unsigned conversions, is a bit unintuitive so a careful reading of the specs is needed.

IV. snprintf tests

Checking the return value of snprintf: Look for uncaught overflows. For example, check (on 64-bit) that attempting to format a single 4gb+1 string with %s results in -1/EOVERFLOW rather than length 1.

[This section incomplete.]

V. stdio tests

To begin, research the gnulib/autoconf tests to determine which tests are actually correct and which ones are looking for GNU-specific behavior. I believe they are already very thorough, so just setting them up to build and run as part of the larger test package may be sufficient. Of course, reading them may also give ideas for further testing of stdio.

[This section incomplete.]

VI. Functions which return strings in caller-provided buffers

Make a list of such functions, and design tests which arrange for each function to return a string just longer than the nominally-available buffer space. Check that the function has not written past the end of the buffer, and that it returns a failure code or indication of truncation if specified to do so.

VII. Functions which manipulate temp copies of an argument string

Make a list of such functions, which would include things like fnmatch, glob, regcomp, and anything that might need to precompute a case-alterred version of its argument string. Then attempt to pass argument strings so long that allocation via malloc fails, or that on-stack allocation (alloca or VLA) wraps the stack pointer or moves it to point over top of other program data. Test that the function returns an error or works without allocation rather than crashing or clobbering memory.

VIII. Threaded setuid/setreuid race conditions

These tests will need root and/or a setuid-somebody binary to do anything useful.

Test 1: Setup RLIMIT_NPROCS and fork a number of child processes which each change to the same uid to exhaust this limit. Create a bunch of threads in the parent process (with the original uid), then call setuid. At the same moment, have the child processes that were exhausting the process limit begin terminating. Test for setuid returning 0 (success) but failing to change the uid for some of the threads (you may need to have each thread call getuid, or read /proc/self/task, to evaluate the results).

Other tests in this group will involve creation or termination of threads within the process as the same time setuid is called, forking at the same time, etc. to look for race conditions around synchronization of threaded uid changes. Understanding the potential races and designing tests is most of the difficulty here.

IX. Signals

A number of pthread functions are specified to disallow returning EINTR. Attempt to have them blocked and interrupted by a non-restarting signal handler, and check that they do not return EINTR.

Test the behavior of sigsuspend, sigtimedwait, sigwait, and sigwaitinfo with regard to cancellation and being interrupted by an unmasked signal.

[This section incomplete.]

X. Recursion tests

Identify library functions which are likely to use recursion and try to make them recurse unboundedly, looking for stack overflow.

XI. Character conversion tests

Test the multibyte/wide character conversion functions in a UTF-8 locale, and the iconv functions with various character sets, to ensure:

invalid UTF-8 sequences are never accepted (see RFC 3629)
non-Unicode-scalar-value wchar_t values are not accepted
mbrtowc never returns -2 for a sequence that is not an initial subsequence of some valid sequence.

Most importantly, “over-long sequences” must not be accepted.

Perhaps some punycode DNS tests could also be added to this task, but that would be hard to test without dropping in a custom nameserver to serve bogus data.

[This section incomplete.]

XII. Advanced potential race conditions & deadlocks

The following operations all potentially deal with resources (counters, cached pid/tid, cached thread structures, etc.) shared by the whole process, and synchronization between them is sufficiently complex that it should be tested:

forking
thread creation and termination
set(r?e)?[ug]id
cancellation

In addition, fork() is async-signal-safe, which means it might be called from a signal handler while another fork, or any of the other above operations, is taking place. POSIX is rather unclear and contradictory on what restrictions, if any, are placed on such use.

Designing these tests requires reading implementation source to understand the potential race and deadlock issues.

Table of Contents

Libc unit testing project

0. Base definition tests

I. String operations testing

II. Malloc testing

III. Numeric parsing tests

IV. snprintf tests

V. stdio tests

VI. Functions which return strings in caller-provided buffers

VII. Functions which manipulate temp copies of an argument string

VIII. Threaded setuid/setreuid race conditions

IX. Signals

X. Recursion tests

XI. Character conversion tests

XII. Advanced potential race conditions & deadlocks