Go to the first, previous, next, last section, table of contents.


2. General File handling (including pipes and sockets)

See also the Sockets FAQ, available at:

http://www.lcg.org/sock-faq/

2.1 How to manage multiple connections?

I have to monitor more than one (fd/connection/stream) at a time. How do I manage all of them?

Use select() or poll().

Note: select() was introduced in BSD, whereas poll() is an artifact of SysV STREAMS. As such, there are portability issues; pure BSD systems may still lack poll(), whereas some older SVR3 systems may not have select(). SVR4 added select(), and the Posix.1g standard defines both.

select() and poll() essentially do the same thing, just differently. Both of them examine a set of file descriptors to see if specific events are pending on any, and then optionally wait for a specified time for an event to happen.

[Important note: neither select() nor poll() do anything useful when applied to plain files; they are useful for sockets, pipes, ptys, ttys & possibly other character devices, but this is system-dependent.]

There the similarity ends....

2.1.1 How do I use select()?

The interface to select() is primarily based on the concept of an fd_set, which is a set of FDs (usually implemented as a bit-vector). In times past, it was common to assume that FDs were smaller than 32, and just use an int to store the set, but these days, one usually has more FDs available, so it is important to use the standard macros for manipulating fd_sets:

fd_set set;
FD_ZERO(&set);      /* empties the set */
FD_SET(fd,&set);    /* adds FD to the set */
FD_CLR(fd,&set);    /* removes FD from the set */
FD_ISSET(fd,&set)   /* true if FD is in the set */

In most cases, it is the system's responsibility to ensure that fdsets can handle the whole range of file descriptors, but in some cases you may have to predefine the FD_SETSIZE macro. This is system-dependent; check your select() manpage. Also, some systems have problems handling more than 1024 file descriptors in select().

The basic interface to select is simple:

int select(int nfds, fd_set *readset, 
                     fd_set *writeset,
                     fd_set *exceptset, struct timeval *timeout);

where

nfds
the number of FDs to examine; this must be greater than the largest FD in any of the fdsets, not the actual number of FDs specified
readset
the set of FDs to examine for readability
writeset
the set of FDs to examine for writability
exceptfds
the set of FDs to examine for exceptional status (note: errors are not exceptional statuses)
timeout
NULL for infinite timeout, or points to a timeval specifying the maximum wait time (if tv_sec and tv_usec both equal zero, then the status of the FDs is polled, but the call never blocks)

The call returns the number of `ready' FDs found, and the three fdsets are modified in-place, with only the ready FDs left in the sets. Use the FD_ISSET macro to test the returned sets.

Here's a simple example of testing a single FD for readability:

int isready(int fd)
{
    int rc;
    fd_set fds;
    struct timeval tv;

    FD_ZERO(&fds);
    FD_SET(fd,&fds);
    tv.tv_sec = tv.tv_usec = 0;

    rc = select(fd+1, &fds, NULL, NULL, &tv);
    if (rc < 0)
      return -1;

    return FD_ISSET(fd,&fds) ? 1 : 0;
}

Note that we can pass NULL for fdsets that we aren't interested in testing.

2.1.2 How do I use poll()?

poll() accepts a pointer to a list of struct pollfd, in which the descriptors and the events you wish to poll for are stored. The events are specified via a bitwise mask in the events field of the structure. The instance of the structure will later be filled in and returned to you with any events which occured. Macros defined by `poll.h' on SVR4 (probably older versions as well), are used to specify the events in the field. A timeout may be specified in milliseconds, only the type provided is an integer which is quite perplexing. A timeout of 0 causes poll() to return immediately; a value of @math{-1} will suspend poll until an event is found to be true.

struct pollfd {
    int fd;        /* The descriptor. */
    short events;  /* The event(s) is/are specified here. */
    short revents; /* Events found are returned here. */
};

A lot like select(), the return value if positive reflects how many descriptors were found to satisfy the events requested. A zero return value is returned if the timeout period is reached before any of the events specified have occured. A negative value should immediately be followed by a check of errno, since it signifies an error.

If no events are found, revents is cleared, so there's no need for you to do this yourself.

The returned events are tested to contain the event.

Here's an example:

/* Poll on two descriptors for Normal data, or High priority data.
   If any found call function handle() with appropriate descriptor
   and priority. Don't timeout, only give up if error, or one of the
   descriptors hangs up. */

#include <stdlib.h>
#include <stdio.h>

#include <sys/types.h>
#include <stropts.h>
#include <poll.h>

#include <unistd.h>
#include <errno.h>
#include <string.h>

#define NORMAL_DATA 1
#define HIPRI_DATA 2

int poll_two_normal(int fd1,int fd2)
{
    struct pollfd poll_list[2];
    int retval;

    poll_list[0].fd = fd1;
    poll_list[1].fd = fd2;
    poll_list[0].events = POLLIN|POLLPRI;
    poll_list[1].events = POLLIN|POLLPRI;

    while(1)
    {
        retval = poll(poll_list,(unsigned long)2,-1);
        /* Retval will always be greater than 0 or -1 in this case.
           Since we're doing it while blocking */

        if(retval < 0)
        {
            fprintf(stderr,"Error while polling: %s\n",strerror(errno));
            return -1;
        }

        if(((poll_list[0].revents&POLLHUP) == POLLHUP) ||
           ((poll_list[0].revents&POLLERR) == POLLERR) ||
           ((poll_list[0].revents&POLLNVAL) == POLLNVAL) ||
           ((poll_list[1].revents&POLLHUP) == POLLHUP) ||
           ((poll_list[1].revents&POLLERR) == POLLERR) ||
           ((poll_list[1].revents&POLLNVAL) == POLLNVAL)) 
          return 0;

        if((poll_list[0].revents&POLLIN) == POLLIN)
          handle(poll_list[0].fd,NORMAL_DATA);
        if((poll_list[0].revents&POLLPRI) == POLLPRI)
          handle(poll_list[0].fd,HIPRI_DATA);
        if((poll_list[1].revents&POLLIN) == POLLIN)
          handle(poll_list[1].fd,NORMAL_DATA);
        if((poll_list[1].revents&POLLPRI) == POLLPRI)
          handle(poll_list[1].fd,HIPRI_DATA);
    }
}

2.1.3 Can I use SysV IPC at the same time as select or poll?

No. (Except on AIX, which has an incredibly ugly kluge to allow this.)

In general, trying to combine the use of select() or poll() with using SysV message queues is troublesome. SysV IPC objects are not handled by file descriptors, so they can't be passed to select() or poll(). There are a number of workarounds, of varying degrees of ugliness:

(Other methods exist.)

2.2 How can I tell when the other end of a connection shuts down?

If you try to read from a pipe, socket, FIFO etc. when the writing end of the connection has been closed, you get an end-of-file indication (read() returns 0 bytes read). If you try and write to a pipe, socket etc. when the reading end has closed, then a SIGPIPE signal will be delivered to the process, killing it unless the signal is caught. (If you ignore or block the signal, the write() call fails with EPIPE.)

2.3 Best way to read directories?

While historically there have been several different interfaces for this, the only one that really matters these days the the Posix.1 standard `<dirent.h>' functions.

The function opendir() opens a specified directory; readdir() reads directory entries from it in a standardised format; closedir() does the obvious. Also provided are rewinddir(), telldir() and seekdir() which should also be obvious.

If you are looking to expand a wildcard filename, then most systems have the glob() function; also check out fnmatch() to match filenames against a wildcard, or ftw() to traverse entire directory trees.

2.4 How can I find out if someone else has a file open?

This is another candidate for `Frequently Unanswered Questions' because, in general, your program should never be interested in whether someone else has the file open. If you need to deal with concurrent access to the file, then you should be looking at advisory locking.

This is, in general, too hard to do anyway. Tools like fuser and lsof that find out about open files do so by grovelling through kernel data structures in a most unhealthy fashion. You can't usefully invoke them from a program, either, because by the time you've found out that the file is/isn't open, the information may already be out of date.

2.5 How do I `lock' a file?

There are three main file locking mechanisms available. All of them are `advisory'[*], which means that they rely on programs co-operating in order to work. It is therefore vital that all programs in an application should be consistent in their locking regime, and great care is required when your programs may be sharing files with third-party software.

[*] Well, actually some Unices permit mandatory locking via the sgid bit -- RTFM for this hack.

Some applications use lock files -- something like `FILENAME.lock'. Simply testing for the existence of such files is inadequate though, since a process may have been killed while holding the lock. The method used by UUCP (probably the most notable example: it uses lock files for controlling access to modems, remote systems etc.) is to store the PID in the lockfile, and test if that pid is still running. Even this isn't enough to be sure (since PIDs are recycled); it has to have a backstop check to see if the lockfile is old, which means that the process holding the lock must update the file regularly. Messy.

The locking functions are:

    flock();
    lockf();
    fcntl();

flock() originates with BSD, and is now available in most (but not all) Unices. It is simple and effective on a single host, but doesn't work at all with NFS. It locks an entire file. Perhaps rather deceptively, the popular Perl programming language implements its own flock() where necessary, conveying the illusion of true portability.

fcntl() is the only POSIX-compliant locking mechanism, and is therefore the only truly portable lock. It is also the most powerful, and the hardest to use. For NFS-mounted file systems, fcntl() requests are passed to a daemon (rpc.lockd), which communicates with the lockd on the server host. Unlike flock() it is capable of record-level locking.

lockf() is merely a simplified programming interface to the locking functions of fcntl().

Whatever locking mechanism you use, it is important to sync all your file IO while the lock is active:

    lock(fd);
    write_to(some_function_of(fd));
    flush_output_to(fd); /* NEVER unlock while output may be buffered */
    unlock(fd);
    do_something_else;   /* another process might update it */
    lock(fd);
    seek(fd, somewhere); /* because our old file pointer is not safe */
    do_something_with(fd);
    ...

A few useful fcntl() locking recipes (error handling omitted for simplicity) are:

#include <fcntl.h>
#include <unistd.h>

read_lock(int fd)   /* a shared lock on an entire file */
{
    fcntl(fd, F_SETLKW, file_lock(F_RDLCK, SEEK_SET));
}

write_lock(int fd)  /* an exclusive lock on an entire file */
{
    fcntl(fd, F_SETLKW, file_lock(F_WRLCK, SEEK_SET));
}

append_lock(int fd) /* a lock on the _end_ of a file -- other
                       processes may access existing records */
{ 
    fcntl(fd, F_SETLKW, file_lock(F_WRLCK, SEEK_END));
}

The function file_lock used by the above is

struct flock* file_lock(short type, short whence) 
{
    static struct flock ret ;
    ret.l_type = type ;
    ret.l_start = 0 ;
    ret.l_whence = whence ;
    ret.l_len = 0 ;
    ret.l_pid = getpid() ;
    return &ret ;
}

2.6 How do I find out if a file has been updated by another process?

This is close to being a Frequently Unanswered Question, because people asking it are often looking for some notification from the system when a file or directory is changed, and there is no portable way of getting this. (IRIX has a non-standard facility for monitoring file accesses, but I've never heard of it being available in any other flavour.)

In general, the best you can do is to use fstat() on the file. (Note: the overhead on fstat() is quite low, usually much lower than the overhead of stat().) By watching the mtime and ctime of the file, you can detect when it is modified, or deleted/linked/renamed. This is a bit kludgy, so you might want to rethink why you want to do it.

2.7 How does the `du' utility work?

du simply traverses the directory structure calling stat() (or more accurately, lstat()) on every file and directory it encounters, adding up the number of blocks consumed by each.

If you want more detail about how it works, then the simple answer is:

Use the source, Luke!

Source for BSD systems (FreeBSD, NetBSD and OpenBSD) is available as unpacked source trees on their FTP distribution sites; source for GNU versions of utilities is available from any of the GNU mirrors, but you have to unpack the archives yourself.

2.8 How do I find the size of a file?

Use stat(), or fstat() if you have the file open.

These calls fill in a data structure containing all the information about the file that the system keeps track of; that includes the owner, group, permissions, size, last access time, last modification time, etc.

The following routine illustrates how to use stat() to get the file size.

#include <stdlib.h>
#include <stdio.h>

#include <sys/types.h>
#include <sys/stat.h>

int get_file_size(char *path,off_t *size)
{
  struct stat file_stats;

  if(stat(path,&file_stats))
    return -1;

  *size = file_stats.st_size;
  return 0;
}

2.9 How do I expand `~' in a filename like the shell does?

The standard interpretation for `~' at the start of a filename is: if alone or followed by a `/', then substitute the current user's home directory; if followed by the name of a user, then substitute that user's home directory. If no valid expansion can be found, then shells will leave the filename unchanged.

Be wary, however, of filenames that actually start with the `~' character. Indiscriminate tilde-expansion can make it very difficult to specify such filenames to a program; while quoting will prevent the shell from doing the expansion, the quotes will have been removed by the time the program sees the filename. As a general rule, do not try and perform tilde-expansion on filenames that have been passed to the program on the command line or in environment variables. (Filenames generated within the program, obtained by prompting the user, or obtained from a configuration file, are good candidates for tilde-expansion.)

Here's a piece of C++ code (using the standard string class) to do the job:

string expand_path(const string& path)
{
    if (path.length() == 0 || path[0] != '~')
      return path;

    const char *pfx = NULL;
    string::size_type pos = path.find_first_of('/');

    if (path.length() == 1 || pos == 1)
    {
        pfx = getenv("HOME");
        if (!pfx)
        {
            // Punt. We're trying to expand ~/, but HOME isn't set
            struct passwd *pw = getpwuid(getuid());
            if (pw)
              pfx = pw->pw_dir;
        }
    }
    else
    {
        string user(path,1,(pos==string::npos) ? string::npos : pos-1);
        struct passwd *pw = getpwnam(user.c_str());
        if (pw)
          pfx = pw->pw_dir;
    }

    // if we failed to find an expansion, return the path unchanged.

    if (!pfx)
      return path;

    string result(pfx);

    if (pos == string::npos)
      return result;

    if (result.length() == 0 || result[result.length()-1] != '/')
      result += '/';

    result += path.substr(pos+1);

    return result;
}

2.10 What can I do with named pipes (FIFOs)?

2.10.1 What is a named pipe?

A named pipe is a special file that is used to transfer data between unrelated processes. One (or more) processes write to it, while another process reads from it. Named pipes are visible in the file system and may be viewed with `ls' like any other file. (Named pipes are also called fifos; this term stands for `First In, First Out'.)

Named pipes may be used to pass data between unrelated processes, while normal (unnamed) pipes can only connect parent/child processes (unless you try very hard).

Named pipes are strictly unidirectional, even on systems where anonymous pipes are bidirectional (full-duplex).

2.10.2 How do I create a named pipe?

To create a named pipe interactively, you'll use either mknod or mkfifo. On some systems, mknod will be found in /etc. In other words, it might not be on your path. See your man pages for details.

To make a named pipe within a C program use mkfifo():

/* set the umask explicitly, you don't know where it's been */
umask(0);
if (mkfifo("test_fifo", S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP))
{
    perror("mkfifo");
    exit(1);
}

If you don't have mkfifo(), you'll have to use mknod():

/* set the umask explicitly, you don't know where it's been */
umask(0);
if (mknod("test_fifo",
            S_IFIFO | S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP,
           0))
{
    perror("mknod");
    exit(1);
}

2.10.3 How do I use a named pipe?

To use the pipe, you open it like a normal file, and use read() and write() just as though it was a plain pipe.

However, the open() of the pipe may block. The following rules apply:

When reading and writing the FIFO, the same considerations apply as for regular pipes and sockets, i.e. read() will return EOF when all writers have closed, and write() will raise SIGPIPE when there are no readers. If SIGPIPE is blocked or ignored, the call fails with EPIPE.

2.10.4 Can I use a named pipe across NFS?

No, you can't. There is no facility in the NFS protocol to do this. (You may be able to use a named pipe on an NFS-mounted filesystem to communicate between processes on the same client, though.)

2.10.5 Can multiple processes write to the pipe simultaneously?

If each piece of data written to the pipe is less than PIPE_BUF in size, then they will not be interleaved. However, the boundaries of writes are not preserved; when you read from the pipe, the read call will return as much data as possible, even if it originated from multiple writes.

The value of PIPE_BUF is guaranteed (by Posix) to be at least 512. It may or may not be defined in `<limits.h>', but it can be queried for individual pipes using pathconf() or fpathconf().

2.10.6 Using named pipes in applications

How can I implement two way communication between one server and several clients?

It is possible that more than one client is communicating with your server at once. As long as each command they send to the server is smaller than PIPE_BUF (see above), they can all use the same named pipe to send data to the server. All clients can easily know the name of the server's incoming fifo.

However, the server can not use a single pipe to communicate with the clients. If more than one client is reading the same pipe, there is no way to ensure that the appropriate client receives a given response.

A solution is to have the client create its own incoming pipe before sending data to the server, or to have the server create its outgoing pipes after receiving data from the client.

Using the client's process ID in the pipe's name is a common way to identify them. Using fifos named in this manner, each time the client sends a command to the server, it can include its PID as part of the command. Any returned data can be sent through the appropriately named pipe.


Go to the first, previous, next, last section, table of contents.