Go to the first, previous, next, last section, table of contents.


1. Process Control

1.1 Creating new processes: fork()

1.1.1 What does fork() do?

#include <sys/types.h>
#include <unistd.h>

pid_t fork(void);

The fork() function is used to create a new process from an existing process. The new process is called the child process, and the existing process is called the parent. You can tell which is which by checking the return value from fork(). The parent gets the child's pid returned to him, but the child gets 0 returned to him. Thus this simple code illustrate's the basics of it.

pid_t pid;

switch (pid = fork())
{
case -1:
    /* Here pid is -1, the fork failed */
    /* Some possible reasons are that you're */
    /* out of process slots or virtual memory */
    perror("The fork failed!");
    break;

case 0:
    /* pid of zero is the child */
    /* Here we're the child...what should we do? */
    /* ... */
    /* but after doing it, we should do something like: */
    _exit(0);

default:
    /* pid greater than zero is parent getting the child's pid */
    printf("Child's pid is %d\n",pid);
}

Of course, one can use if()... else... instead of switch(), but the above form is a useful idiom.

Of help when doing this is knowing just what is and is not inherited by the child. This list can vary depending on Unix implementation, so take it with a grain of salt. Note that the child gets copies of these things, not the real thing.

Inherited by the child from the parent:

Unique to the child:

1.1.2 What's the difference between fork() and vfork()?

Some systems have a system call vfork(), which was originally designed as a lower-overhead version of fork(). Since fork() involved copying the entire address space of the process, and was therefore quite expensive, the vfork() function was introduced (in 3.0BSD).

However, since vfork() was introduced, the implementation of fork() has improved drastically, most notably with the introduction of `copy-on-write', where the copying of the process address space is transparently faked by allowing both processes to refer to the same physical memory until either of them modify it. This largely removes the justification for vfork(); indeed, a large proportion of systems now lack the original functionality of vfork() completely. For compatibility, though, there may still be a vfork() call present, that simply calls fork() without attempting to emulate all of the vfork() semantics.

As a result, it is very unwise to actually make use of any of the differences between fork() and vfork(). Indeed, it is probably unwise to use vfork() at all, unless you know exactly why you want to.

The basic difference between the two is that when a new process is created with vfork(), the parent process is temporarily suspended, and the child process might borrow the parent's address space. This strange state of affairs continues until the child process either exits, or calls execve(), at which point the parent process continues.

This means that the child process of a vfork() must be careful to avoid unexpectedly modifying variables of the parent process. In particular, the child process must not return from the function containing the vfork() call, and it must not call exit() (if it needs to exit, it should use _exit(); actually, this is also true for the child of a normal fork()).

1.1.3 Why use _exit rather than exit in the child branch of a fork?

There are a few differences between exit() and _exit() that become significant when fork(), and especially vfork(), is used.

The basic difference between exit() and _exit() is that the former performs clean-up related to user-mode constructs in the library, and calls user-supplied cleanup functions, whereas the latter performs only the kernel cleanup for the process.

In the child branch of a fork(), it is normally incorrect to use exit(), because that can lead to stdio buffers being flushed twice, and temporary files being unexpectedly removed. In C++ code the situation is worse, because destructors for static objects may be run incorrectly. (There are some unusual cases, like daemons, where the parent should call _exit() rather than the child; the basic rule, applicable in the overwhelming majority of cases, is that exit() should be called only once for each entry into main.)

In the child branch of a vfork(), the use of exit() is even more dangerous, since it will affect the state of the parent process.

1.2 Environment variables

1.2.1 How can I get/set an environment variable from a program?

Getting the value of an environment variable is done by using getenv().

#include <stdlib.h>

char *getenv(const char *name);

Setting the value of an environment variable is done by using putenv().

#include <stdlib.h>

int putenv(char *string);

The string passed to putenv must not be freed or made invalid, since a pointer to it is kept by putenv(). This means that it must either be a static buffer or allocated off the heap. The string can be freed if the environment variable is redefined or deleted via another call to putenv().

Remember that environment variables are inherited; each process has a separate copy of the environment. As a result, you can't change the value of an environment variable in another process, such as the shell.

Suppose you wanted to get the value for the TERM environment variable. You would use this code:

char *envvar;

envvar=getenv("TERM");

printf("The value for the environment variable TERM is ");
if(envvar)
{
    printf("%s\n",envvar);
}
else
{
    printf("not set.\n");
}

Now suppose you wanted to create a new environment variable called MYVAR, with a value of MYVAL. This is how you'd do it.

static char envbuf[256];

sprintf(envbuf,"MYVAR=%s","MYVAL");

if(putenv(envbuf))
{
    printf("Sorry, putenv() couldn't find the memory for %s\n",envbuf);
    /* Might exit() or something here if you can't live without it */
}

1.2.2 How can I read the whole environment?

If you don't know the names of the environment variables, then the getenv() function isn't much use. In this case, you have to dig deeper into how the environment is stored.

A global variable, environ, holds a pointer to an array of pointers to environment strings, each string in the form "NAME=value". A NULL pointer is used to mark the end of the array. Here's a trivial program to print the current environment (like printenv):

#include <stdio.h>

extern char **environ;

int main()
{
    char **ep = environ;
    char *p;
    while ((p = *ep++))
        printf("%s\n", p);
    return 0;
}

In general, the environ variable is also passed as the third, optional, parameter to main(); that is, the above could have been written:

#include <stdio.h>

int main(int argc, char **argv, char **envp)
{
    char *p;
    while ((p = *envp++))
        printf("%s\n", p);
    return 0;
}

However, while pretty universally supported, this method isn't actually defined by the POSIX standards. (It's also less useful, in general.)

1.3 How can I sleep for less than a second?

The sleep() function, which is available on all Unixes, only allows for a duration specified in seconds. If you want finer granularity, then you need to look for alternatives:

Of the above, select() is probably the most portable (and strangely, it is often much more efficient than usleep() or an itimer-based method). However, the behaviour may be different if signals are caught while asleep; this may or may not be an issue depending on the application.

Whichever route you choose, it is important to realise that you may be constrained by the timer resolution of the system (some systems allow very short time intervals to be specified, others have a resolution of, say, 10ms and will round all timings to that). Also, as for sleep(), the delay you specify is only a minimum value; after the specified period elapses, there will be an indeterminate delay before your process next gets scheduled.

1.4 How can I get a finer-grained version of alarm()?

Modern Unixes tend to implement alarms using the setitimer() function, which has a higher resolution and more options than the simple alarm() function. One should generally assume that alarm() and setitimer(ITIMER_REAL) may be the same underlying timer, and accessing it both ways may cause confusion.

Itimers can be used to implement either one-shot or repeating signals; also, there are generally 3 separate timers available:

ITIMER_REAL
counts real (wall clock) time, and sends the SIGALRM signal
ITIMER_VIRTUAL
counts process virtual (user CPU) time, and sends the SIGVTALRM signal
ITIMER_PROF
counts user and system CPU time, and sends the SIGPROF signal; it is intended for interpreters to use for profiling.

Itimers, however, are not part of many of the standards, despite having been present since 4.2BSD. The POSIX realtime extensions define some similar, but different, functions.

1.5 How can a parent and child process communicate?

A parent and child can communicate through any of the normal inter-process communication schemes (pipes, sockets, message queues, shared memory), but also have some special ways to communicate that take advantage of their relationship as a parent and child.

One of the most obvious is that the parent can get the exit status of the child.

Since the child inherits file descriptors from its parent, the parent can open both ends of a pipe, fork, then the parent close one end and the child close the other end of the pipe. This is what happens when you call the popen() routine to run another program from within yours, i.e. you can write to the file descriptor returned from popen() and the child process sees it as its stdin, or you can read from the file descriptor and see what the program wrote to its stdout. (The mode parameter to popen() defines which; if you want to do both, then you can do the plumbing yourself without too much difficulty.)

Also, the child process inherits memory segments mmapped anonymously (or by mmapping the special file `/dev/zero') by the parent; these shared memory segments are not accessible from unrelated processes.

1.6 How do I get rid of zombie processes?

1.6.1 What is a zombie?

When a program forks and the child finishes before the parent, the kernel still keeps some of its information about the child in case the parent might need it -- for example, the parent may need to check the child's exit status. To be able to get this information, the parent calls wait(); when this happens, the kernel can discard the information.

In the interval between the child terminating and the parent calling wait(), the child is said to be a `zombie'. (If you do `ps', the child will have a `Z' in its status field to indicate this.) Even though it's not running, it's still taking up an entry in the process table. (It consumes no other resources, but some utilities may show bogus figures for e.g. CPU usage; this is because some parts of the process table entry have been overlaid by accounting info to save space.)

This is not good, as the process table has a fixed number of entries and it is possible for the system to run out of them. Even if the system doesn't run out, there is a limit on the number of processes each user can run, which is usually smaller than the system's limit. This is one of the reasons why you should always check if fork() failed, by the way!

If the parent terminates without calling wait(), the child is `adopted' by init, which handles the work necessary to cleanup after the child. (This is a special system program with process ID 1 -- it's actually the first program to run after the system boots up).

1.6.2 How do I prevent them from occuring?

You need to ensure that your parent process calls wait() (or waitpid(), wait3(), etc.) for every child process that terminates; or, on some systems, you can instruct the system that you are uninterested in child exit states.

Another approach is to fork() twice, and have the immediate child process exit straight away. This causes the grandchild process to be orphaned, so the init process is responsible for cleaning it up. For code to do this, see the function fork2() in the examples section.

To ignore child exit states, you need to do the following (check your system's manpages to see if this works):

    struct sigaction sa;
    sa.sa_handler = SIG_IGN;
#ifdef SA_NOCLDWAIT
    sa.sa_flags = SA_NOCLDWAIT;
#else
    sa.sa_flags = 0;
#endif
    sigemptyset(&sa.sa_mask);
    sigaction(SIGCHLD, &sa, NULL);

If this is successful, then the wait() functions are prevented from working; if any of them are called, they will wait until all child processes have terminated, then return failure with errno == ECHILD.

The other technique is to catch the SIGCHLD signal, and have the signal handler call waitpid() or wait3(). See the examples section for a complete program.

1.7 How do I get my program to act like a daemon?

A daemon process is usually defined as a background process that does not belong to a terminal session. Many system services are performed by daemons; network services, printing etc.

Simply invoking a program in the background isn't really adequate for these long-running programs; that does not correctly detach the process from the terminal session that started it. Also, the conventional way of starting daemons is simply to issue the command manually or from an rc script; the daemon is expected to put itself into the background.

Here are the steps to become a daemon:

  1. fork() so the parent can exit, this returns control to the command line or shell invoking your program. This step is required so that the new process is guaranteed not to be a process group leader. The next step, setsid(), fails if you're a process group leader.
  2. setsid() to become a process group and session group leader. Since a controlling terminal is associated with a session, and this new session has not yet acquired a controlling terminal our process now has no controlling terminal, which is a Good Thing for daemons.
  3. fork() again so the parent, (the session group leader), can exit. This means that we, as a non-session group leader, can never regain a controlling terminal.
  4. chdir("/") to ensure that our process doesn't keep any directory in use. Failure to do this could make it so that an administrator couldn't unmount a filesystem, because it was our current directory. [Equivalently, we could change to any directory containing files important to the daemon's operation.]
  5. umask(0) so that we have complete control over the permissions of anything we write. We don't know what umask we may have inherited. [This step is optional]
  6. close() fds 0, 1, and 2. This releases the standard in, out, and error we inherited from our parent process. We have no way of knowing where these fds might have been redirected to. Note that many daemons use sysconf() to determine the limit _SC_OPEN_MAX. _SC_OPEN_MAX tells you the maximun open files/process. Then in a loop, the daemon can close all possible file descriptors. You have to decide if you need to do this or not. If you think that there might be file-descriptors open you should close them, since there's a limit on number of concurrent file descriptors.
  7. Establish new open descriptors for stdin, stdout and stderr. Even if you don't plan to use them, it is still a good idea to have them open. The precise handling of these is a matter of taste; if you have a logfile, for example, you might wish to open it as stdout or stderr, and open `/dev/null' as stdin; alternatively, you could open `/dev/console' as stderr and/or stdout, and `/dev/null' as stdin, or any other combination that makes sense for your particular daemon.

Almost none of this is necessary (or advisable) if your daemon is being started by inetd. In that case, stdin, stdout and stderr are all set up for you to refer to the network connection, and the fork()s and session manipulation should not be done (to avoid confusing inetd). Only the chdir() and umask() steps remain as useful.

1.8 How can I look at process in the system like ps does?

You really don't want to do this.

The most portable way, by far, is to do popen(pscmd, "r") and parse the output. (pscmd should be something like `"ps -ef"' on SysV systems; on BSD systems there are many possible display options: choose one.)

In the examples section, there are two complete versions of this; one for SunOS 4, which requires root permission to run and uses the `kvm_*' routines to read the information from kernel data structures; and another for SVR4 systems (including SunOS 5), which uses the `/proc' filesystem.

It's even easier on systems with an SVR4.2-style `/proc'; just read a psinfo_t structure from the file `/proc/PID/psinfo' for each PID of interest. However, this method, while probably the cleanest, is also perhaps the least well-supported. (On FreeBSD's `/proc', you read a semi-undocumented printable string from `/proc/PID/status'; Linux has something similar.)

1.9 Given a pid, how can I tell if it's a running program?

Use kill() with 0 for the signal number.

There are four possible results from this call:

The most-used technique is to assume that success or failure with EPERM implies that the process exists, and any other error implies that it doesn't.

An alternative exists, if you are writing specifically for a system (or all those systems) that provide a `/proc' filesystem: checking for the existence of `/proc/PID' may work.

1.10 What's the return value of system/pclose/waitpid?

The return value of system(), pclose(), or waitpid() doesn't seem to be the exit value of my process... or the exit value is shifted left 8 bits... what's the deal?

The man page is right, and so are you! If you read the man page for waitpid() you'll find that the return code for the process is encoded. The value returned by the process is normally in the top 16 bits, and the rest is used for other things. You can't rely on this though, not if you want to be portable, so the suggestion is that you use the macros provided. These are usually documented under wait() or wstat.

Macros defined for the purpose (in `<sys/wait.h>') include (stat is the value returned by waitpid()):

WIFEXITED(stat)
Non zero if child exited normally.
WEXITSTATUS(stat)
exit code returned by child
WIFSIGNALED(stat)
Non-zero if child was terminated by a signal
WTERMSIG(stat)
signal number that terminated child
WIFSTOPPED(stat)
non-zero if child is stopped
WSTOPSIG(stat)
number of signal that stopped child
WIFCONTINUED(stat)
non-zero if status was for continued child
WCOREDUMP(stat)
If WIFSIGNALED(stat) is non-zero, this is non-zero if the process left behind a core dump.

1.11 How do I find out about a process' memory usage?

Look at getrusage(), if available.

1.12 Why do processes never decrease in size?

When you free memory back to the heap with free(), on almost all systems that doesn't reduce the memory usage of your program. The memory free()d is still part of the process' address space, and will be used to satisfy future malloc() requests.

If you really need to free memory back to the system, look at using mmap() to allocate private anonymous mappings. When these are unmapped, the memory really is released back to the system. Certain implementations of malloc() (e.g. in the GNU C Library) automatically use mmap() where available to perform large allocations; these blocks are then returned to the system on free().

Of course, if your program increases in size when you think it shouldn't, you may have a `memory leak' -- a bug in your program that results in unused memory not being freed.

1.13 How do I change the name of my program (as seen by `ps')?

On BSDish systems, the ps program actually looks into the address space of the running process to find the current argv[], and displays that. That enables a program to change its `name' simply by modifying argv[].

On SysVish systems, the command name and usually the first 80 bytes of the parameters are stored in the process' u-area, and so can't be directly modified. There may be a system call to change this (unlikely), but otherwise the only way is to perform an exec(), or write into kernel memory (dangerous, and only possible if running as root).

Some systems (notably Solaris) may have two separate versions of ps, one in `/usr/bin/ps' with SysV behaviour, and one in `/usr/ucb/ps' with BSD behaviour. On these systems, if you change argv[], then the BSD version of ps will reflect the change, and the SysV version won't.

Check to see if your system has a function setproctitle().

1.14 How can I find a process' executable file?

This would be a good candidate for a list of `Frequently Unanswered Questions', because the fact of asking the question usually means that the design of the program is flawed. :-)

You can make a `best guess' by looking at the value of argv[0]. If this contains a `/', then it is probably the absolute or relative (to the current directory at program start) path of the executable. If it does not, then you can mimic the shell's search of the PATH variable, looking for the program. However, success is not guaranteed, since it is possible to invoke programs with arbitrary values of argv[0], and in any case the executable may have been renamed or deleted since it was started.

If all you want is to be able to print an appropriate invocation name with error messages, then the best approach is to have main() save the value of argv[0] in a global variable for use by the entire program. While there is no guarantee whatsoever that the value in argv[0] will be meaningful, it is the best option available in most circumstances.

The most common reason people ask this question is in order to locate configuration files with their program. This is considered to be bad form; directories containing executables should contain nothing except executables, and administrative requirements often make it desirable for configuration files to be located on different filesystems to executables.

A less common, but more legitimate, reason to do this is to allow the program to call exec() on itself; this is a method used (e.g. by some versions of sendmail) to completely reinitialise the process (e.g. if a daemon receives a SIGHUP).

1.14.1 So where do I put my configuration files then?

The correct directory for this usually depends on the particular flavour of Unix you're using; `/var/opt/PACKAGE', `/usr/local/lib', `/usr/local/etc', or any of several other possibilities. User-specific configuration files are usually hidden `dotfiles' under $HOME (e.g. `$HOME/.exrc').

From the point of view of a package that is expected to be usable across a range of systems, this usually implies that the location of any sitewide configuration files will be a compiled-in default, possibly using a `--prefix' option on a configure script (Autoconf scripts do this). You might wish to allow this to be overridden at runtime by an environment variable. (If you're not using a configure script, then put the default in the Makefile as a `-D' option on compiles, or put it in a `config.h' header file, or something similar.)

User-specific configuration should be either a single dotfile under $HOME, or, if you need multiple files, a dot-subdirectory. (Files or directories whose names start with a dot are omitted from directory listings by default.) Avoid creating multiple entries under $HOME, because this can get very cluttered. Again, you can allow the user to override this location with an environment variable. Programs should always behave sensibly if they fail to find any per-user configuration.

1.15 Why doesn't my process get SIGHUP when its parent dies?

Because it's not supposed to.

SIGHUP is a signal that means, by convention, "the terminal line got hung up". It has nothing to do with parent processes, and is usually generated by the tty driver (and delivered to the foreground process group).

However, as part of the session management system, there are exactly two cases where SIGHUP is sent on the death of a process:

1.16 How can I kill all descendents of a process?

There isn't a fully general approach to doing this. While you can determine the relationships between processes by parsing ps output, this is unreliable in that it represents only a snapshot of the system.

However, if you're lauching a subprocess that might spawn further subprocesses of its own, and you want to be able to kill the entire spawned job at one go, the solution is to put the subprocess into a new process group, and kill that process group if you need to.

The preferred function for creating process groups is setpgid(). Use this if possible rather than setpgrp() because the latter differs between systems (on some systems `setpgrp();' is equivalent to `setpgid(0,0);', on others, setpgrp() and setpgid() are identical).

See the job-control example in the examples section.

Putting a subprocess into its own process group has a number of effects. In particular, unless you explicitly place the new process group in the foreground, it will be treated as a background job with these consequences:

In many applications input and output will be redirected anyway, so the most significant effect will be the lack of keyboard signals. The parent application should arrange to catch at least SIGINT and SIGQUIT (and preferably SIGTERM as well) and clean up any background jobs as necessary.


Go to the first, previous, next, last section, table of contents.