Mercurial > docs > diploma

--- a/thesis/tex/5-Improvements.tex	Thu Feb 05 21:46:25 2009 +0100
+++ b/thesis/tex/5-Improvements.tex	Fri Feb 06 12:09:07 2009 +0100
@@ -282,13 +282,11 @@

 The two approaches for spam handling were already presented to the reader in section~\ref{sec:functional-requirements} as \RF\,8 and \RF\,9. Here they are described in more detail:

-\begin{description}
-\item[Refusing spam during the \SMTP\ dialog]
-This is the way it was meant by the designers of the \SMTP\ protocol. They thought checking the sender's and recipient's mail addresses would be enough, but as they are forgeable, it is not. More and more complex checks are needed to be done. Checking needs time, but \SMTP\ dialogs time out if it takes too long. Thus during the \SMTP\ dialog, only limited time can be used for checking if a message seems to be spam. The advantage of this approach is that bad messages can simply get refused---no responsibility for them is taken and no further system load is added. See \RFC\,2505 (especially section 1.5) for detail.
+\begin{enumerate}
+\item Refusing spam during the \SMTP\ dialog: This is the way it was meant by the designers of the \SMTP\ protocol. They thought checking the sender's and recipient's mail addresses would be enough, but as they are forgeable, it is not. More and more complex checks are needed to be done. Checking needs time, but \SMTP\ dialogs time out if it takes too long. Thus during the \SMTP\ dialog, only limited time can be used for checking if a message seems to be spam. The advantage of this approach is that bad messages can simply get refused---no responsibility for them is taken and no further system load is added. See \RFC\,2505 (especially section 1.5) for detail.

-\item[Checking for spam after the mail was accepted and queued]
-Here it is possible to invest more processing time, thus more detailed checks can be done. But, as responsibility for messages was taken, it is no choice to simply delete spam mail. Checks for spam do not lead to sure results, they just indicate the possibility the message is unwanted mail. \person{Eisentraut} lists actions to take after a message is recognized as probably spam \cite[pages 18--20]{eisentraut05}. For mail the \MTA\ is responsible for, the only acceptable action is adding further or rewriting existing header lines. Thus all further work on the spam messages is the same as for non-spam messages.
-\end{description}
+\item Checking for spam after the mail was accepted and queued: Here it is possible to invest more processing time, thus more detailed checks can be done. But, as responsibility for messages was taken, it is no choice to simply delete spam mail. Checks for spam do not lead to sure results, they just indicate the possibility the message is unwanted mail. \person{Eisentraut} lists actions to take after a message is recognized as probably spam \cite[pages 18--20]{eisentraut05}. For mail the \MTA\ is responsible for, the only acceptable action is adding further or rewriting existing header lines. Thus all further work on the spam messages is the same as for non-spam messages.
+\end{enumerate}

 Modern \MTA{}s use both techniques in combination. Checks during the \SMTP\ dialog tend to be implemented in the \MTA\ to make them fast; checks after the message was queued are often done using external programs (\name{spamassassin} is a well known one). \person{Eisentraut} sees the checks during the \SMTP\ dialog to be essential: ``Ganz ohne Analyse w\"ahrend der \SMTP-Phase kommt sowieso kein \MTA\ aus, und es ist eine Frage der Einsch\"atzung, wie weit man diese Phase belasten m\"ochte.'' \cite[page 25, (translated: ``No \MTA\ can go without analysis during the \SMTP\ phase anyway, but the amount of stress one likes to put on this phase is left to his discretion.'')]{eisentraut05}

@@ -354,7 +352,7 @@
 Now follows a description of the modules of the new architecture. They are described in the same order in which a message passes through them.


-\paragraph{\name{Receiver modules}}
+\paragraph{Receiver modules}
 They are the communication interface between external senders and the \name{queue-in} module. Each protocol needs a corresponding \name{receiver module} to be supported. Most popular is the \name{sendmail} module, which is a command to be called from the local host, and the \name{smtpd} module which usually listens on port 25. Other modules to support other protocols may be added as needed. Receiving modules that need to listen on ports should get invoked by \name{inetd}, or by \person{Bernstein}'s more secure \name{ucspi-tcp}. This makes it possible to run them with least privilege.


@@ -370,7 +368,7 @@
 This module takes messages from the \name{outgoing} queue, queries information about the online state, and passes the messages to the correct transport module. Successfully transferred messages are removed from the \name{outgoing} queue. The \masqmail\ specific tasks of the route management are handled by this module, too.


-\paragraph{\name{Transport modules}}
+\paragraph{Transport modules}
 These modules send outgoing mail; they are the interface between \name{queue-out} and remote hosts or local commands. The most popular modules of this kind are the \name{smtp} module which acts as an \SMTP\ client and the \name{pipe} module to interface gateways to other systems or networks like \NAME{FAX} and \NAME{UUCP}. A module for local delivery is not included; \masqmail\ passes this job to an \NAME{MDA} which gets invoked through the \name{pipe} module. (See section~\ref{sec:functional-requirements} for reasons.)


@@ -378,21 +376,17 @@

 \subsubsection*{The queue}

-%XXX
-
-The queuing system consists of two queues and a message pool. The queues store the spool files---unprocessed in \name{incoming} and in complete and valid form in \name{outgoing}. The \name{pool} is the storage of data files, the message bodies of queued messages. The three parts are represented by three directories within the queue path on disk.
+The queuing system consists of two queues and a message pool. The queues store the spool files---in unprocessed form in \name{incoming} and in complete and valid form in \name{outgoing}. The \name{pool} is the storage of the data files. On disk, the three parts of the queuing system are represented by three directories within the queue path.

-The representation of queued files on disk is basically the same as the one in current \masqmail: one file for the envelope and message header information (the ``spool file''), a second file for the message body (the ``data file''). The spool file's internal structure of current \masqmail\ can be remain.
+The representation of queued messages on disk is basically the same as in current \masqmail: One file for the envelope and message header information (the ``spool file'') and a second file for the message body (the ``data file'').

-Following is a sample spool file from current \masqmail. The first part is the envelope and meta information. The annotations in parenthesis are added afterwards to ease the understanding. The second part after the empty line is the message header.
+The currently used internal structure of the spool files can remain. Following is a sample spool file from current \masqmail. The first part is the envelope and meta information. The annotations in parenthesis are only added to ease the understanding. The second part, after the empty line, is the message header.

 \codeinput{input/sample-spool-file.txt}

-The spool file is written into the \name{incoming} queue. The \name{scanning} modules reads it, processes it, and writes a new one into the \name{outgoing} queue; the file in \name{incoming} is deleted then. \name{queue-out} finally takes the spool file from \name{outgoing} and the data file from the \name{pool} to generate the resulting message.
-
 The spool file owner's executable bit shows if a file is ready for further processing: The module that writes the file into the queue sets the bit as last action. Modules that read from the queue can process messages that have the bit set. This approach is derived from \postfix.

-The data file is stored in a separate data pool. It is written by \name{queue-in}; \name{scanning} can read it if necessary; \name{queue-out} reads it to generate the outgoing message and deletes it after successful transfer. Data files do not change at all within the system. They are written in default local text format. Required translation is done in the receiver and transport modules.
+The data file is stored into the \name{pool} by \name{queue-in}; it never gets modified until it is deleted by \name{queue-out}. They consist of data in local default text format.


@@ -402,28 +396,30 @@

 Communication between modules is required to exchange data and status information. This is also called ``Inter-process communication'' (short: \NAME{IPC}) because the modules are independent programs in this case and processes are programs in execution.

-The connections between \name{queue-in} and \name{scanning}, as well as between \name{scanning} and \name{queue-out} is provided by the queues, only sending signals to trigger runs may be useful. Communication between receiving and transport modules and the outside world are done using the specific protocol they do handle.
+The connections between \name{queue-in} and \name{scanning}, as well as between \name{scanning} and \name{queue-out}, is provided by the queues, only signals might be useful to trigger runs. Communication between receiver and transport modules and the outside world is organized by their specific protocol (e.g.\ \SMTP).

-Left is only communication between the receiver modules and \name{queue-in}, and between \name{queue-out} and the transport modules. Data is exchanged using \unix\ pipes and a simple protocol. Figure~\ref{fig:ipc-protocol} shows a state diagram for the protocol. Solid lines indicate client actions, dashed lines indicate server responses.
+Left is only the communication between the receiver modules and \name{queue-in}, and between \name{queue-out} and the transport modules. Suggested for this communication is a simple protocol with data exchange through \unix\ pipes. Figure~\ref{fig:ipc-protocol} shows a state diagram for the protocol.

 \begin{figure}
 	\begin{center}
 		\includegraphics[scale=0.75]{img/ipc-protocol.eps}
 	\end{center}
-	\caption{State diagram of the \NAME{IPC} protocol}
+	\caption{State diagram of the \NAME{IPC} protocol. (Solid lines indicate client actions, dashed lines indicate server responses.)}
 	\label{fig:ipc-protocol}
 \end{figure}

+The protocol is described in more detail now:
+
 \paragraph{Timing}
-One dialog consists of exactly three phases: connection attempt, envelope and header transfer, and transfer of the message body. The order is always the same. The three phases are all initiated by the client process; after each phase the server process sends a success or error reply. Timeouts for each phase need to be implemented.
+One dialog consists of exactly three phases: (1) The connection attempt, (2) The envelope and header transfer, and (3) The transfer of the message body. The order is always the same. The three phases are all initiated by the client process. After each phase the server process sends a success or failure reply. Timeouts for each phase need to be implemented.

 \paragraph{Semantics}
-The connection attempt is simply opening the connection. This starts the dialog. A positive reply by the server leads to the transfer of envelope and message header. If the server again sends a positive reply, the message data is transferred too. A last server reply ends the dialog.
+The connection attempt is simply opening the connection. This starts the dialog. A positive reply by the server leads to the transfer of the envelope and the message header. If the server again sends a positive reply, the message data is transferred. A last server reply ends the dialog.

-The client indicates the end of each data transfer with a special terminator sequence. The appearance of this terminator sequence tells the server process that the data transfer is complete and makes the server send a reply. The server process takes responsibility of the data in sending a success reply. A failure reply immediately stops the dialog and resets both client and server to the state before the connection attempt.
+The client indicates the end of each data transfer with a special terminator sequence. The appearance of this terminator sequence tells the server process that the data transfer is complete. The server then needs to send its reply. The server process takes responsibility for the data in sending a success reply. A failure reply immediately stops the dialog and resets both client and server to the state before the connection attempt.

 \paragraph{Syntax}
-Data transfer is done by sending plain text data. \name{Line Feed}---the native line separator on \unix---is used as line separator. The terminator sequence used to indicate the end of the data transfer is the \NAME{ASCII} \name{null} character (`\texttt{\textbackslash0}'). Replies are one-digit numbers with `\texttt{0}' meaning success and any other number (`\texttt{1}'--`\texttt{9}') indicate failure.
+Data transfer is done by sending plain text data. \name{Line Feed} (`\texttt{\textbackslash{}n}')---the native line separator on Unix---is used as line separator. The terminator sequence used to indicate the end of the data transfer is the \NAME{ASCII} \name{null} character (`\texttt{\textbackslash0}'). Replies are one-digit numbers with `\texttt{0}' meaning success and any other number (`\texttt{1}'--`\texttt{9}') indicating failure.


@@ -431,23 +427,23 @@

 \subsubsection*{Rights and permissions}

-The set of system users that is required for \qmail\ seems to be too complex for \masqmail. One system user, like \postfix\ uses, is more appropriate. \name{root} privilege and \name{setuid} permission should to be avoided as feasible.
+The set of system users that is required for \qmail\ seems to be too complex for \masqmail. One system user, like \postfix\ uses, is more appropriate. \name{root} privilege and \name{setuid} permission should to be avoided if feasible.

-The \name{queue-in} module is the part of the system that is most critical about permission. It either needs to run as deamon (as a specific user) or be \name{setuid} or \name{setgid} in order to avoid a world-writable queue. \person{Ian~R.\ Justman} recommends to use \name{setgid} in this situation:
+The \name{queue-in} module is the part of the system that is most critical about permission. It either needs to run as deamon or be \name{setuid} or \name{setgid} in order to avoid a world-writable queue. \person{Ian~R.\ Justman} recommends to use \name{setgid} in this situation:

 \begin{quote}
 But if all you need to do is post a file into an area which does not have world writability but does have group writability, and you want accountability, the best, and probably easiest, way to accomplish this without the need for excess code for uid switching (which is tricky to deal with especially with setuid-to-root programs) is the setgid bit and a group-writable directory.
 \hfill\cite{justman:bugtraq}
 \end{quote}

-\person{Bernstein} chose \name{setuid} for the \name{qmail-queue} module, \person{Venema} uses \name{setgid} in \postfix, the differences are small. But each of them is better than running the module as a deamon. A deamon needs more resources and therefore become inefficient on systems with low mail amount like the ones \masqmail\ will probably run on. Short running processes are additionally higher obstacles for intruders because if an intruder managed to take one over it will die soon.
+\person{Bernstein} chose \name{setuid} for the \name{qmail-queue} module, \person{Venema} uses \name{setgid} in \postfix, yet the differences are small. Better than running the module as a deamon is each of them. A deamon needs more resources and therefore becomes inefficient on systems with low mail amount, like the ones \masqmail\ will probably run on. Short running processes are additionally higher obstacles for intruders, because a process will die soon if an intruder managed to take one over.


-The modules \name{scanning} and \name{queue-out} are candidates for all-time running daemon processes. But they could also get periodically started by \name{cron}. Another possibility is to run a master process as daemon which starts and restarts the system parts. \postfix\ has such a master process, \qmail\ lacks it. The jobs of a master process can be done by the other tools of the operating system too, thus making the master process abdicable. \masqmail\ does probably better go without a master process because it aims to save resources, not to get the best performance.
+The modules \name{scanning} and \name{queue-out} are candidates for all-time running daemon processes. Alternatively they could be started by \name{cron} to do single runs.

-%The \name{scanning} module can run in background and look for new mail in regular intervals or signals may be sent to it by \name{queue-in}. Alternatively it can be called by \name{cron} to do single runs.
+Another possibility is to run a master process as daemon which starts and restarts the system parts. \postfix\ has such a master process, \qmail\ lacks it. The jobs of a master process can be done by other tools of the operating system too, thus making a master process abdicable. \masqmail\ does probably better go without a master process, because it aims to save resources, not to get the best performance.

-In general is a sane permission management very important for secure software. The \name{principle of least privilege}, as it is often called, should be respected. If it is possible to use lower privilege then it should be done. An example for doing so is the \name{smtpd} module. It is a server module which listens on a port. One way is to start it as root, let it bind to the port, and drop all privilege before it does any other work. But root privilege is avoidable completely if \name{inetd} or one of its substitutes listens on the port instead of the \name{smtpd} module. The \name{smtpd} module gets launched by \name{inetd} to handle the connection when a connection attempt to the port is made. The \name{smtpd} module needs no privilege at all this way.
+A sane permission management is very important for secure software in general. The \name{principle of least privilege}, as it is often called, should be respected. If it is possible to use lower privilege then it should be done. An example for doing so is the \name{smtpd} module. It is a server module which listens on a port. One way is to start it as root and let it bind to the port and drop all privilege before it does any other work. But root privilege is avoidable completely if \name{inetd}, or one of its substitutes, listens on the port instead of the \name{smtpd} module. \name{inetd} will then launch the \name{smtpd} module to handle the connection whenever a connection attempt to the port is made. The \name{smtpd} module needs no privilege at all this way.