# HG changeset patch # User meillo@marmaro.de # Date 1233950961 -3600 # Node ID 16d8eacf60e101511e8d3b2efbadb95e7fca5d49 # Parent b4b06bc050597881d3600b4f66e2399c65330ba9 created index (it is not finished) diff -r b4b06bc05059 -r 16d8eacf60e1 thesis/tex/1-Introduction.tex --- a/thesis/tex/1-Introduction.tex Fri Feb 06 21:08:49 2009 +0100 +++ b/thesis/tex/1-Introduction.tex Fri Feb 06 21:09:21 2009 +0100 @@ -94,7 +94,7 @@ \person{Kurth} abandoned \masqmail\ after 2005 and no one adopted the project since then. Thus, the author of this thesis decided to take over responsibility for \masqmail\ now. He received \person{Kurth}'s permission to do so in private telephone conversation with \person{Kurth} on September 4, 2008. -The program's new homepage \citeweb{masqmail:homepage} includes a collection of available information about this \MTA. +The program's new homepage\index{masqmail!homepage} \citeweb{masqmail:homepage} includes a collection of available information about this \MTA. @@ -102,7 +102,7 @@ \subsection{Target field} \label{sec:masqmail-target-field} -\person{Kurth}'s intention when creating \masqmail\ is best told in his own words: +\person{Kurth}'s intention when creating \masqmail\ is best told in his own words:\index{masqmail!design intention} \begin{quote} MasqMail is a mail server designed for hosts that do not have a permanent internet connection eg. a home network or a single host at home. It has special support for connections to different \NAME{ISP}s. It replaces sendmail or other \MTA{}s such as qmail or exim. @@ -111,7 +111,7 @@ It is intended to cover a specific niche: non-permanent Internet connection and different \name{Internet Service Providers} (short: \NAME{ISP}s). -Although it can basically replace other \MTA{}s it is not \emph{generally} aimed to do so. The package description of \masqmail\ within \debian\ states this more clearly by changing the last sentence to: +Although it can basically replace other \MTA{}s it is not \emph{generally} aimed to do so. The package description of \masqmail\ within \debian\ states this more clearly by changing the last sentence to:\index{debian!masqmail package} \begin{quote} In these cases, MasqMail is a slim replacement for full-blown \MTA{}s such as sendmail, exim, qmail or postfix. @@ -120,9 +120,9 @@ The program is a good replacement ``in these cases'' but not generally, since it lacks essential features for running on publically accessable mail servers. It is primarily not secure enough for being accessible from untrusted locations. -\masqmail\ is best used in home networks which are non-permanently connected to the Internet. It is easy configurable for situations which are rarely solvable with the common \MTA{}s. Such include different handling of mail to local or remote destination and respecting different routes of online connection. These features are explained in more detail in section~\ref{sec:masqmail-features}. +\masqmail\ is best used in home networks which are non-permanently connected to the Internet\index{non-permanent}. It is easy configurable for situations which are rarely solvable with the common \MTA{}s. Such include different handling of mail to local or remote destination and respecting different routes of online connection. These features are explained in more detail in section~\ref{sec:masqmail-features}. -While many other \MTA{}s are general purpose \MTA{}s, \masqmail\ aims on special situations. Nevertheless, it can be used as general purpose \MTA\ too. Especially this was a design goal of \masqmail: To be a replacement for \sendmail\ or similar \MTA{}s. +While many other \MTA{}s are general purpose \MTA{}s, \masqmail\ aims on special situations. Nevertheless, it can be used as general purpose \MTA\ too. Especially this was a design goal of \masqmail: To be a replacement for \sendmail\ or similar \MTA{}s.\index{masqmail!sendmail replacement} \masqmail\ is designed to run on workstations and on servers in small networks, like they are common in \NAME{SOHO}s (\name{Small Offices/Home Offices}). @@ -130,7 +130,7 @@ \subsubsection*{Typical usage scenarios} -This section describes three common setups that make sensible use of \masqmail. The first two are shown in figure~\ref{fig:masqmail-typical-usage}. +This section describes three common setups that make sensible use of \masqmail. The first two are shown in figure~\ref{fig:masqmail-typical-usage}.\index{masqmail!common setups} \begin{figure} \begin{center} @@ -147,27 +147,38 @@ \label{scenario1} If no server is present, every workstation would be equipped with \masqmail. Mail transfer within the same machine or within the local net works straight forward using direct transfer. Outgoing mail to the Internet is sent to an \name{Internet Service Provider} (short: \NAME{ISP}) for relaying whenever the router goes online. The configuration of \masqmail\ would be the same on every computer; only host names would differ. To receive mail from the Internet requires a mailbox on the \NAME{ISP}'s mail server. Mail needs to be fetched from the \NAME{ISP}'s server onto the workstation using the \NAME{POP3} or \NAME{IMAP} protocol. +\index{isp} +\index{pop3} +\index{imap} \item[Scenario 2:] \label{scenario2} In the same network but with a server, one could have \masqmail\ running on the server and using simple forwarders (see section~\ref{subsec:relay-only}) on the workstations to transfer mail to the server. The server would then, dependent on the destination of the message, deliver locally or relay to an \NAME{ISP}'s server for further relay. This setup does only support mail transfer to the server but not back to a workstation. However, this can be solved by mounting the user's mailbox from the server to the workstation or by using \NAME{POP3} or \NAME{IMAP}. Mail transfer from the \NAME{ISP} to the local server needs \NAME{POP3} or \NAME{IMAP} as well. +\index{isp} +\index{pop3} +\index{imap} \item[Scenario 3:] \label{scenario3} A third scenario is unrelated as it is about notebooks. Notebooks are usually used as mobile workstations. One uses them to work at different locations. With the increasing popularity of wireless networks this becomes more and more common. Different networks demand for different setups: In one network it is best to send mail to an \NAME{ISP} for relay. In another network it might be preferred to use a local mail server. A third network may have no Internet access at all, hence using a local mail server is required. All these different setups can be configured once and then used by simply telling the online state to \masqmail, even automatically within a network setup script. +\index{isp} +\index{notebook} \end{description} In general, all kinds of usage scenarios within a trusted network are possible. Important to notice is that mail can not be sent from outside into the trusted network then. For using \masqmail\ on notebooks it is suggested to only accept mail from local users because notebooks are often in untrusted environments. +\index{untrusted environments} \subsubsection*{Limitations} +\index{masqmail!limitations} Although \masqmail\ is seen as a replacement for other general purpose \MTA{}s, it should not be used on large mail servers. The reasons are that it implements only a basic subset of features and that its performance and security is not as good as needed for such usage. The author, \person{Kurth}, warns on the old project's website about using \masqmail\ to accept connections from the Internet because of the risk of being an open relay: +\index{open relay} \begin{quote} MasqMail is not designed to run on a host with a permanent internet connection. It does not have the ability to check for spam mail and it will relay everything from everywhere to everywhere. Use another mail server such as exim for permanent connections. @@ -175,6 +186,7 @@ \end{quote} The actual problem is not the permanent Internet connection but listening for incoming mail on it. If a firewall is closed for incoming mail, then the permanent Internet connection is no problem. To use \masqmail\ for permanent Internet connections it needs to be secured with care. +\index{firewall} The Internet is the common example for an untrusted network but other networks may be untrusted too. @@ -197,12 +209,23 @@ \subsubsection*{The source code} \masqmail\ is written in the C programming language. The program, as of version 0.2.21, consists of 34 source code and eight header files which contain about 9\,000 lines of code\footnote{Measured with \name{sloccount} by David A.\ Wheeler \citeweb{sloccount}.}. Additionally, it includes a \name{base64} implementation (about 300 lines) and \name{md5} code (about 150 lines). For systems that do not provide \name{libident}, this library is distributed as well (circa 600 lines); an available shared library has higher precedence in linking, though. +\index{c} +\index{lines of code} +\index{base64} +\index{md5} +\index{libident} The only mandatory dependency is \name{glib}---a cross-platform software utility library, originated in the \NAME{GTK+} project. It provides safe replacements for many standard library functions, especially for the string functions. It also offers handy data containers, easy-to-use implementations of data structures, and much more. +\index{glib} +\index{masqmail!dependencies} Some parts of \masqmail's functionality can be included or excluded at compile time by defining symbols. To enable maildir support for example, one has to add \verb_--enable-maildir_ to the configure call. Otherwise the concerning code gets removed during preprocessing. +\index{exclude code} +\index{maildir} With \masqmail\ comes the small tool \path{mservdetect}; it helps setting up a configuration that uses the \name{mserver} system for online state detection. Two other binaries get compiled for testing purposes: \path{readtest} and \path{smtpsend}. These three additional programs use parts of \masqmail's source code; they only add a file with a \verb+main()+ function each. +\index{mserver} +\index{test program} @@ -210,19 +233,29 @@ \label{sec:masqmail-features} \masqmail\ supports two channels for incoming mail: +\index{masqmail!incoming channels} \begin{enumerate} -\item Standard input which is used when \path{masqmail} (or the \path{sendmail} link) is executed on the command line -\item A \NAME{TCP} socket which is used by local or remote clients that talk \SMTP + \item Standard input which is used when \path{masqmail} (or the \path{sendmail} link) is executed on the command line + \item A \NAME{TCP} socket which is used by local or remote clients that talk \SMTP \end{enumerate} +\index{sendmail!command} +\index{tcp socket} The outgoing channels for mail are: \begin{enumerate} -\item Direct delivery to local mailboxes (in \name{mbox} or \name{maildir} format) -\item Local pipes to pass mail to a program (e.g.\ to \MDA{}s or to gateways to \NAME{UUCP} or fax) -\item \NAME{TCP} sockets to transfer mail to other \MTA{}s using the \SMTP\ protocol + \item Direct delivery to local mailboxes (in \name{mbox} or \name{maildir} format) + \item Local pipes to pass mail to a program (e.g.\ to \MDA{}s or to gateways to \NAME{UUCP} or fax) + \item \NAME{TCP} sockets to transfer mail to other \MTA{}s using the \SMTP\ protocol \end{enumerate} +\index{tcp socket} +\index{local delivery} +\index{mbox} +\index{maildir} +\index{uucp} +\index{fax} +\index{gateway} Figure~\ref{fig:masqmail-channels} shows this as a picture. (The ``online state'' input is explained a bit later.) @@ -232,36 +265,48 @@ \end{center} \caption{Incoming and outgoing channels of \masqmail} \label{fig:masqmail-channels} + \index{figure!incoming and outgoing channels of \masqmail} \end{figure} Outgoing \SMTP\ connections feature \SMTP-\NAME{AUTH} and \SMTP-after-\NAME{POP} authentication but incoming connections do not. Using wrappers for outgoing connections is supported. This allows encrypted communication through a gateway application like \name{openssl}. +\index{auth!smtp-auth} +\index{auth!smtp-after-pop} Mail queuing is essential for \masqmail\ and thus supported of course, alias expansion is also supported. +\index{alias expansion} The \masqmail\ executable can be called by various names for sendmail-compatibility reasons. As many programs expect the \MTA\ to be located at \path{/usr/lib/sendmail} or \path{/usr/sbin/sendmail}, symbolic links are pointing from there to the \masqmail\ executable. Furthermore does \sendmail\ support calling it with a different name instead of supplying command line arguments. The best known of these shortcuts is \path{mailq} which is equivalent to calling it with the argument \verb+-bq+. \masqmail\ recognizes the shortcuts \path{mailq}, \path{smtpd}, \path{mailrm}, \path{runq}, \path{rmail}, and \path{in.smtpd}. The first two are inspired by \sendmail. Not implemented yet is the shortcut \path{newaliases} because \masqmail\ does not generate binary representations of the alias file.\footnote{A shell script named \path{newaliases} that invokes \texttt{masqmail -bi} can provide the command to satisfy strict requirements.} \path{hoststat} and \path{purgestat} are missing for complete sendmail-compatibility. -%masqmail: mailq, mailrm, runq, rmail, smtpd/in.smtpd -%sendmail: hoststat, mailq, newaliases, purgestat, smtpd +\index{sendmail!compatibility} +\index{symbolic link} +\index{shortcuts} -Additional to the \MTA\ job, \masqmail\ also offers mail retrieval services by acting as a \NAME{POP3} client. It can fetch mail from different remote locations, also dependent on the active online connection. Such functionality is especially useful in a setup like \name{Scenario 2} on page \pageref{scenario2}. +Additional to the \MTA\ job, \masqmail\ also offers mail retrieval services by acting as a \NAME{POP3} client. It can fetch mail from different remote locations, also dependent on the active online connection. Such functionality is especially useful in a setup like \name{Scenario 2} on page~\pageref{scenario2}. +\index{pop3} \subsubsection*{Online detection and online routes} \label{sec:masqmail-routes} +\index{masqmail!online routes} \masqmail\ focuses on handling different non-permanent online connections, thus a concept of online routes is used. One may configure any number of routes to send mail. Each route can have criteria to determine if some message is allowed to be sent over it. Mail to destinations outside the local network gets queued until a suitable online connections is available. +\index{non-permanent} The idea behind this concept is sending mail to the Internet through the mail server of the same \NAME{ISP} over which one had dialed in. It was quite common that \NAME{ISP}s accepted mail for relay only if it came from a online connection they managed. This means, it was not possible to relay mail through the mail server of one \NAME{ISP} while being online through the connection of another \NAME{ISP}. \masqmail\ is a solution to the wish of switching the relaying mail server easily. +\index{isp} Related is \masqmail's ability to rewrite the sender's email address dependent on which \NAME{ISP} is used. This prevents mail from being likely classified as spam. +\index{spam} To react on the different situations, \masqmail\ needs to query the current online state. Is an online connection available? And if it is: Which one? Three methods are implemented: +\index{online state} \begin{enumerate} -\item Reading from a file -\item Reading the output of a command -\item Querying an \name{mserver} system + \item Reading from a file + \item Reading the output of a command + \item Querying an \name{mserver} system \end{enumerate} +\index{mserver} Each method may return a string naming the route that is online or returning nothing to indicate offline state. @@ -276,14 +321,21 @@ \section{Why \masqmail\ is worth it} +\index{masqmail!reasons to revive} First of all, \masqmail\ is better suited for its target field of operation (multiple non-permanent online connections) than every other \MTA. Especially is such usage easy to set up because \masqmail\ was designed for that. Many alternative \MTA{}s were not designed for those scenarios at all as the following two example show: ``Exim is designed for use on a network where most messages can be delivered at the first attempt.'' \cite[page~30]{hazel01}. ``qmail was designed for well-connected hosts: those with high-speed, always-on network connectivity.'' \cite[page9]{sill02}. +\index{non-permanent} +\index{qmail} +\index{exim} %fixme: hikernet Additionally does \masqmail\ make it easy to run an \MTA\ on workstations or notebooks. There is no need to do complex configuration or to be a mail server expert. Only a handful of options need to be set; the host name, the local networks, and one route for relaying are sufficient in most times. %fixme: is that true? +\index{notebook} Probably users say it best; in this case \person{Derek Broughton}: +\index{masqmail!users} + \begin{quote} No kidding. The whole point is that you \_have\_ to have an \MTA\ and you don't want to configure Postfix/Exim/Sendmail/Qmail (almost all of which I've @@ -300,16 +352,21 @@ Not to forget \masqmail's size. \masqmail\ is much smaller than full-blown \MTA{}s like \sendmail, \postfix, or \exim, and still smaller than \qmail. (See section~\ref{sec:mta-comparison} for details.) This makes \masqmail\ a good choice for workstations or even embedded computers. Again words of a user who chose \masqmail\ as \MTA\ on his old laptop with a 75 megahertz processor and eight megabytes of \NAME{RAM}: + \begin{quote} Masqmail appears to be a great sendmail replacement in this case. It's small and is built to support sending mail ``off-line'', and to connecting to the \SMTP\ servers of several \NAME{ISP}s. \hfill\citeweb{stosberg:low-mem-laptop} \end{quote} +\index{isp} +\index{notebook} Although the development on \masqmail\ has been stopped in 2003, \masqmail\ still has its users. Having users is already reason enough for further development and maintenance. This applies especially when the software covers a niche and when requirements for such software in general changed. Both is the case for \masqmail. It is difficult to get numbers about users of Free Software because no one needs to tell anyone when he uses some software. \debian's \name{popcon} statistics \citeweb{popcon.debian} are a try to provided numbers. For January 2009, the statistics report 60 \masqmail\ installations of which 49 are in active use. If it is assumed that one third of all \debian\ users report their installed software\footnote{One third is a high guess as it means there would be only about 230 thousand \debian\ installations in total. But according to the \name{Linux Counter} \citeweb{counter.li.org} between 490 thousand and 12 million \debian\ users can be estimated.}, there would be in total around 150 active \masqmail\ installations in \debian. \name{Ubuntu} which also does \name{popcon} statistics \citeweb{popcon.ubuntu}, counts 82 installations with 13 active ones. If here also one third of all systems submit their data, 40 active installations can be added. Including a guessed amount of additional 30 installations on other \unix\ operating systems makes about 220 \masqmail\ installations in total. Of course one person may have \masqmail\ installed on more than one computer, but a total of 150 different users seems to be realistic. +\index{debian!popcon} +\index{masqmail!users} %The increasing number of systems using \masqmail, as it is shown on the \name{popcon} graph \citeweb{popcon.debian:masqmail}, seems to be impressive in the beginning as \masqmail\ was not developed during that time. But it might come from the increasing popularity of \name{popcon} over the time. @@ -327,8 +384,10 @@ \section{Problems to solve} +\index{masqmail!problems} A program that is neglected for more than five years in a field of operation that changed during this time surely needs improvement. Security and spam have highly increased in importance since 2003. Dial-up connections became rare, instead broadband flat rates are common now. Other \MTA{}s evolved in respect to theses changes---\masqmail\ did not. +\index{dial-up connections} The current market situation and trends for the future need to be identified. Looks at other \MTA{}s need to be taken. Required work on \masqmail\ needs to be defined in combination with the evaluation of strategies to do this work. And a plan for further development should be created. @@ -341,8 +400,10 @@ This thesis is neither a installation guide for \masqmail\ nor a detailed explanation of \masqmail's source code. Installation and setup guides can be found on \masqmail's homepage \citeweb{masqmail:homepage}. The \NAME{POP3} functionality of \masqmail\ receives few regard in this document because it is not directly related to the core of \masqmail\ which is being an \MTA. +\index{pop3} The \name{mserver} system to query the online state is also only mentioned but not regarded further. It seems best to move this functionality into a separate program which is run through the shell command interface, anyway. +\index{mserver} diff -r b4b06bc05059 -r 16d8eacf60e1 thesis/tex/2-MarketAnalysis.tex --- a/thesis/tex/2-MarketAnalysis.tex Fri Feb 06 21:08:49 2009 +0100 +++ b/thesis/tex/2-MarketAnalysis.tex Fri Feb 06 21:09:21 2009 +0100 @@ -8,6 +8,7 @@ \section{Electronic communication technologies} Electronic communication is ``communication by computer'', according to the \name{WordNet} database of the \name{Princeton University} \citeweb{wordnet}. Mobile phones and fax machines should be seen as computers here too. The \name{Science Glossary} of the \name{Pennsylvania Department of Education} \citeweb{science-glossary-pa} describes electronic communication as ``System for the transmission of information using electronic technology (e.g., digital cameras, cellular telephones, Internet, television, fiber optics).'' +\index{electronic communication} Electronic communication needs no transport of tangible things, only electrons, photons, or radio waves need to be transmitted. Thus electronic communication is fast in general. With costs mainly for infrastructure and very low costs for data transmission is electronic communication also cheap communication. Primary the Internet is used as underlying transport infrastructure. Thus electronic communication is available nearly everywhere around the world. These properties---fast, cheap, available---make electronic communication well suited for long distance communication. @@ -17,7 +18,9 @@ \subsection{Classification} + Electronic communication technologies can be divided in synchronous and asynchronous communication. Synchronous communication is direct dialog with little delay. Telephone conversation is an example. Asynchronous communication consists of independent messages. Dialogs are possible as well, but not in the same direct fashion. These two groups can also be split by the time which is needed for data delivery. Synchronous communication requires nearly real-time delivery, whereas for asynchronous communication message delivery times of several seconds or minutes are sufficient. +\index{electronic communication!classification of} Another possible separation is to distinguish recorded and written information. Recorded information, like audio or video data, is accessible only in a linear way by spooling and replay. Written information, on the other hand, can be accessed in arbitrary sequence, detail and speed. @@ -31,16 +34,20 @@ \end{center} \caption{Classification of electronic communication} \label{fig:comm-classification} + \index{figure!Classification of electronic communication} \end{figure} One might be surprised to find Instant \emph{Messaging} not in the group of \emph{message} communication. Instant Messaging could be put in both groups because it allows asynchronous communication additional to being a chat system. The reasons why it is classified as dialog communication are its primary use for dialog communication and the very fast---instant---delivery time. Email is not limited to written information, at least not anymore since the advent of \NAME{MIME}, which allows to include multimedia content in textual email messages. Thus recorded information can be sent as sub parts of emails. The same applies to Instant Messaging too, where file transfer is an additional sub service offered by most systems. In general recorded information can be transmitted in an encoded textual form. +\index{mime} \subsection{Life cycle analysis} +\index{life cycle analysis} + Life cycle analysis are common for products but also for technologies. This one here is for electronic communication technologies. The first dimension regarded is the life time of the subject. It is segmented into the introduction, growth, mature, saturation, and decline phases. The second dimension can display market share, importance, or similar values. The graph has always an S-line shape, with a slow start, a rapidly increasing first half, the highest level in the fourth fifths, and a slowly declining end. Reaching the end of the life cycle means that the subject gets superseded by successors or the market situation changed thus it is old-fashioned. The current position on the life cycle of some selected communication technologies is shown in figure~\ref{fig:comm-lifecycle}. It is important to notice that the time dimension can be different for each technology---some life cycles are shorter than others---the shape of the graph, however, is the same. @@ -51,6 +58,7 @@ \end{center} \caption{Life cycle of electronic communication technologies} \label{fig:comm-lifecycle} + \index{figure!Life cycle of electronic communication technologies} \end{figure} Video messages and voice mail are technologies in the introduction phase. Voice over \NAME{IP} is heavily growing these days. Instant Messaging has reached maturation and is still growing. Email is an example for a technology in the saturation phase. Telefax, for instance, is a declining technology. @@ -58,19 +66,28 @@ Email ranges in the saturation phase which is defined by a saturated market. No more products are needed: there is no more growth. This means, email is a technology which is used by everyone who want to use it. It is a standard technology. The current form of email in the current market is on the top of its life cycle. The future is decline, sooner or later. But life cycles positions change as the subject or the market changes. An examples is the \name{Flash} animation software \citeweb{flash:homepage}. The product's change from a drawing and animation system to a technology for website creation, advertising, and movie distribution, and the thus changing target market, made it slip back on the life cycle. If the email system would evolve to become the basis for Unified Messaging (see section~\ref{sec:unified-messaging}), a similar slip back would be the consequence. +\index{flash} +\index{um} The \NAME{DVD} standards \NAME{DVD+} and \NAME{DVD$-$} are an example for a changing market. With the upcoming next generation formats \name{Blu-ray Disc} \citeweb{wikipedia:bluray} and \NAME{HD-DVD} \citeweb{wikipedia:hddvd}, a much sooner decline of \NAME{DVD+} and \NAME{DVD$-$} started, even before they reached their last improvement steps in storage size. Such can happen to email too, if Unified Messaging is a revolution to the email system instead of an evolution. +\index{dvd} +\index{um} \subsection{Trends} + Following are the trends for electronic communication. They are shown from the view point of \MTA{}s. Nevertheless are these trends common for all of the communication technology. +\index{electronic communication!trends} \subsubsection*{Consolidation} + There is a consolidation of communication technologies with similar transport characteristics going on, nowadays. Email is the most flexible kind of asynchronous communication technology in major use. Hence email is the best choice for transferring messages of any kind today. But in future it probably will be \name{Unified Messaging}, which tries to group all types of asynchronous messaging into one communication system. It aims to provide transparent transport for all kinds of content and flexible access interfaces for all kinds of clients. Unified Messaging seems to have the potential to be the successor of all asynchronous communication technologies, including email. +\index{um} Today email still is the major asynchronous communication technology and it probably will be it for the next years. Unified Messaging needs similar transfer facilities as email, thus it seems to be rather an evolution to the current technology than a revolution. Hence \MTA{}s will still be of importance in future, though maybe in a modified form. +\index{mta!future importance of} \subsubsection*{Integration} @@ -83,6 +100,7 @@ Communication hardware comes from two different roots: On one side, the telephone, now available as mobile phones. This group centers around recorded data and dialog but messages are also supported by the answering machine and \NAME{SMS}. On the other side, mail and its relatives like email, which use computers as main hardware. This part centers around document messages but also supports dialog communication in Instant Messaging and Voice over \NAME{IP}. The last years finally brought the two groups together, with \name{smart phones} being the merging hardware element. Smart phones are computers in the size of mobile phones or mobile phones with the capabilities of computers, however one likes to see it. They provide both functions by being telephones \emph{and} computers. +\index{smart phone} Smart phones match well the requirements of recorded data for which they were designed. Text is difficult to write with their minimal keyboards, but speech to text converters may provide help in future. This leads to a need for ordinary computers for the field of exchanging text documents and as better input hardware for all written information. @@ -90,6 +108,7 @@ \subsubsection*{Unified Communication} +\index{uc} \name{Unified Communication} is the technology that aims to consolidate and integrate all electronic communication and to provide access for all kinds of hardware clients. Unified Communication tries to bring the three trends here mentioned together. The \NAME{PC} \name{Magazine} has the following definition in its Encyclopedia: ``[Unified Communications is t]he real-time redirection of a voice, text or e-mail message to the device closest to the intended recipient at any given time.'' \citeweb{pcmag:uc}. The main goal is to integrate all kinds of communication (synchronous and asynchronous) into one system, hence this requires real-time delivery of data. @@ -101,6 +120,7 @@ \subsubsection*{Unified Messaging} \label{sec:unified-messaging} +\index{um} \name{Unified Messaging}, although often used exchangeable with Unified Communications, is only a subset of it. It does not require real-time data transmission and is therefore only usable for asynchronous communication \citeweb{wikipedia:uc}. Unified Messaging's basic function is: Receiving incoming messages from various channels, converting them into a common format, and storing them into a single memory. The stored messages can then be accessed from different devices \citeweb{wikipedia:um}. @@ -119,6 +139,7 @@ \section{Electronic mail} +\index{email} %fixme: add short summery: where exactly is masqmail's position within e-comm? @@ -128,13 +149,16 @@ \subsection{SWOT analysis} \label{sec:swot-analysis} +\index{swot analysis} A \NAME{SWOT} analysis regards the strengths and weaknesses of a subject against the opportunities and threats of its market. The slightly altered form called \name{Dialectical} \NAME{SWOT} \name{analysis}, which is used here, is described in \cite{powerof2x2}. \NAME{SWOT} analysis should always focus on a specific goal which is to reach. In this case, the main goal is to make email future-safe. The two dimension---the subject and the market---are regarded in relation to each other by the analysis. Here the analysis shall be driven by the market's dimension. Thus first threats of the market are identified and split into being strengths or weaknesses of email. Then the same is done for opportunities of the market. +\index{the market} \subsubsection*{Threats} The market's main threat is \emph{spam}, also named \name{junk mail} or \name{unsolicited commercial email} (\NAME{UCE}). \person{David~A.\ Wheeler} is clear about it: +\index{spam} \begin{quote} Since \emph{receivers} pay the bulk of the costs for spam (including most obviously their time to delete all that incoming spam), spam use will continue to rise until effective technical and legal countermeasures are deployed, \emph{or} until people can no longer use email. @@ -142,6 +166,7 @@ \end{quote} The amount of spam is huge. Panda Security and Commtouch state in their \name{Email Threats Trend Report} for the second Quarter of 2008: ``Spam levels throughout the second quarter averaged 77\,\%, ranging from a low of 64\,\% to a peak of 94\,\% of all email [...]'' \cite[page 4]{panda:email-threats}. The report sees the main source of spam in bot nets consisting of zombie computers: ``Spam and malware levels remain high for yet another quarter, powered by the brawny yet agile networks of zombie \NAME{IP}s.'' \cite[page 1]{panda:email-threats}. This is supported by IronPort Systems: ``More than 80 percent of spam now comes from a `zombie'---an infected \NAME{PC}, typically in a consumer broadband network, that has been hijacked by spammers.'' \cite{ironport:zombie-computers}. Positive for \MTA{}s is that they are not the main source for spam, but it is only a small delight. Spam is a general weakness of the email system because it is not stoppable. +\index{spam!sources of} @@ -151,11 +176,13 @@ Opportunities of the market are large data transfers, originating in multimedia content, which becomes popular. If email is used as basis for Unified Messaging, lots of voice and video mail will be transferred. Email is weak related to this kind of data: The data needs to be encoded to \NAME{ASCII} which stresses mail servers a lot. %fixme: ref to store-and-forward +\index{um} The use of different hardware to access mail is another opportunity of the market. But as more hardware gets involved, the networks become more complex. Thus the need for more software and infrastructure to transfer mail within the growing network might be a weakness of the email system. %fixme: think about that An opportunity of the market and at the same time a strength of electronic mail is its standardization. Few other communication technologies are standardized, and thus freely available, in a similar way. %fixme: ref Another opportunity and strength is the modular and extensible structure of electronic mail; it can easily evolve to new requirements. %fixme: ref +\index{email!standardiziation} The increasing integration of communication channels is an opportunity for the market. But deciding whether it is a weakness or strength of email is difficult. Due to the impossibility to integrate synchronous stream data and large binary data, it is a weakness. But it is also a strength, because arbitrary asynchronous communication data already can be integrated. On the other hand, the integration might be a threat too, because integration often leads to complexity of software. Complex software is more error prone and thus less reliable. This, however, could again be a strength of electronic mail because its modular design decreases complexity. @@ -167,8 +194,10 @@ \end{center} \caption{\NAME{SWOT} analysis for email} \label{fig:email-swot} + \index{figure:\NAME{SWOT} analysis for email} \end{figure} + \subsubsection*{Resulting strategies} The result of a \NAME{SWOT} analysis is a set of strategies that advice how to best react on the identified opportunities and threats, dependent on whether they are strengths or weaknesses of the subject. These strategies are what should be done to achieve the overall goal---here making email future-safe. @@ -187,25 +216,36 @@ \subsection{Trends for electronic mail} \label{sec:email-trends} +\index{email!trends} Nothing remains the same, neither does the email technology. Emailing in future will probably differ from emailing today. This section tries to identify possible trends that affect the future of electronic mail. \subsubsection*{Provider independence} + Today's email structure is heavily dependent on email providers. This means, most people have email addresses from some provider. These can be providers that offer email accounts in addition to their regular services, for example online connections. \NAME{AOL} and \name{T\mbox{-}On\-line} for instance do so. Or specialized email providers that commonly offer free mail as well as enhanced mail services for which one has to pay. Examples for specialized email providers are \NAME{GMX} and \name{Yahoo}. %fixme: check for non-breakable dash +\index{mail provider} Outgoing mail is send either with the web mail client of the provider or by using an \MUA\ which sends it to the provider for relay. Incoming mail is read with the web mail client or retrieved from the provider via \NAME{POP3} or \NAME{IMAP} to the local computer to be read using the \MUA. This means all mail sending and receiving work is done by the provider. +\index{pop3} +\index{imap} +\index{mua} The reason therefore is originated in the time when people used dial-up connections to the Internet. A mail server needs to be online to receive email. Sending mail is no problem, but receiving it is hardly possible with an \MTA\ which is few time online. Internet service providers had servers that were all day long connected to the Internet. So they offered email service, and they still do. +\index{dial-up} +\index{isp} Nowadays, dial-up Internet access became rare; the majority of the users has broadband Internet access. As a flat rate is payed for it, the time being online does not affect costs anymore, even traffic is unlimited. Today it is possible to have an own mail server running at home. The remaining technical problem is the changing \NAME{IP} addresses one gets assigned every 24 hours\footnote{This, at least, is the situation in Germany.}. But this is solvable with one of the dynamic \NAME{DNS} services; they provide the mapping of a fixed domain name to the changing \NAME{IP} addresses. +\index{changing ip addresses} Home servers become popular for central data storage and multimedia services, these days. Being assembled of energy efficient hardware, power consumption is no big problem anymore. These home servers will replace video recorders and \NAME{CD} music collections in the near future. It is also realistic that they will manage heating systems and intercoms too. Given the future leads to this direction, it will be a logical step to have email and other communication provided by the own home server as well. +\index{home server} After years in which \MTA{}s have not been popular for users, the next years might bring the \MTA{}s back to the users. Maybe in a few years nearly everyone will have one, or many, running at home. \subsubsection*{Pushing versus polling} +\index{push email} The retrieval of email is a field that is also about to change these days. The old way is to fetch email by polling the server that holds the personal mailbox. This polling is normally done in regular intervals, often once every five to thirty minutes. The mail transfer from the mailbox to the \MUA\ is initiated from the user side. The disadvantage herewith is the delay between the arrival of mail on the server and the time when the user finally has the message on his screen. @@ -221,15 +261,19 @@ Changing requirements for email communication lead to the need for new concepts and new protocols that cover these requirements. One of these concepts to redesign the email system is named \name{Internet Mail 2000}. It was proposed by \person{Daniel~J.\ Bernstein}, the creator of \qmail. Similar approaches were independently introduced by others too. %FIXME: add references for IM2000 +\index{Internet Mail 2000} As main change, the sender has the responsibility for mail storage; only a notification about a mail message gets sent to the recipient. The recipient can then fetch the message then from the sender's server. This is in contrast to the \NAME{SMTP} mail architecture where mail and the responsibility for it is transferred from the sender to the receiver (see \name{store-and-forward}). %fixme: reference to the store-and-forward concept +\index{smtp!store-and-forward} \MTA{}s are still important in this new email architecture, but in a slightly different way. They do not transfer mail itself anymore, but they transport the notifications about new mail to the destinations. This is a quite similar job as in the \NAME{SMTP} model. The real transfer of the mail, however, can be done in an arbitrary way, for example via \NAME{FTP} or \NAME{SCP}. A second concept, this one primary to arm against spam, is \person{David~A.\ Wheeler}'s \name{Guarded Email} \cite{wheeler03}. It requires messages to be recognized as Ham (non-spam) to be accepted, otherwise a challenge-response authentication will be initiated. +\index{Guarded Email} \name{Hashcash} by \person{Adam Back}---a third concept---tries to limit spam and denial of service attacks \cite{back02}. It requests payment for email. The costs are computing time for the generation of hash values. Thus sending spam becomes expensive. Further information about \name{Hashcash} can be found on \citeweb{hashcash:homepage}. +\index{Hashcash} New concepts, like the ones presented here, are invented to remove problems of the email technology. \name{Internet Mail 2000}, for instance, removes the spam problem and the problem of large message transfers. @@ -243,22 +287,28 @@ \paragraph{Easy configuration} Provider independence through running an own mail server at home asks for easy configuration of the \MTA. Providers have specialists to configure the systems, but ordinary people do not. Solutions are either having some home service system for computer configuration established with specialists coming to ones home to set up the systems; like it is already common for problems with the power and water supply systems. Or configuration needs to be easy and fool-proof, so it can be done by the owner himself. The latter solution depends on standardized parts that fit together seamlessly. The technology must not be a problem itself. Only settings that are custom to the users environment should be left open for him to set. This of course needs to be doable using a simple configuration interface like a web interface. Non-technical educated users should be able to configure the system. +\index{easy configuration} Complex configuration itself is not a problem if simplification wrappers provide an easy interface. The approach of wrappers to make it look easier to the outside is a good concept in general. %FIXME: add ref It still lets the specialist do complex and detailed configuration while also a simple configuration interface to novices is offered. \sendmail\ took this approach with the \name{m4} macros. %fixme: add ref Further more is this approach well suited to provide various wrappers with different user interfaces (e.g.\ graphical programs, websites, command line programs; all of them either in a questionnaire style or interactive). +\index{sendmail!m4 macros} \paragraph{Performance} When \MTA{}s become popular on home servers and maybe even on workstations and smart phones, then performance will be less important. Providers need \MTA{}s that process large amounts of mail in short time. There is no need for home servers and workstations to handle that much mail; they need to process far less email messages per time unit. Thus performance will probably not be a main requirement for an \MTA\ in future, given they mainly run on private machines. +\index{performance} \paragraph{Flexibility} New mailing concepts and architectures like push email or \name{Internet Mail 2000} will, if they succeed, require \MTA{}s to adopt the new technology. \MTA{}s that are not able to change are going to be sorted out by evolution. Thus it is important \emph{not} to focus too much on one use case, but to stay flexible. \person{Allman} saw the flexibility of \sendmail\ one reason for its huge success (see section~\ref{sec:sendmail}). +\index{flexibility} \paragraph{Security} Another important requirement for all kinds of software is security. There is a constant trend coming from completely non-secured software, in the 70s and 80s, over growing security awareness, in the 90s, to security being a primary goal, now. This leads to the conclusion that software security will be even more important within the next years. As more clients get connected to the Internet and especially more computers are listening for incoming connections (like an \MTA\ in a home server), there are more possibilities to break into systems. Securing of software systems will require increasing effort in future. +\index{security} \paragraph{Out-of-the-box usage} \name{Plug-and-play}-able hardware with preconfigured software can be expected to become popular. Like someone buys a set-top box to watch Pay-\NAME{TV} today, he might be buying a mail server box in a few years. He plugs the power cable in, inserts his email address in a web interface, and selects the clients (computers or smart phones) to which mail should be send and from which mail is accepted for relay. That's all. It would just work then, like everyone expects it from a set-top box today. Secure and robust software is a precondition for such boxes to make this vision possible. +\index{out-of-the-box usage} diff -r b4b06bc05059 -r 16d8eacf60e1 thesis/tex/3-MailTransferAgents.tex --- a/thesis/tex/3-MailTransferAgents.tex Fri Feb 06 21:08:49 2009 +0100 +++ b/thesis/tex/3-MailTransferAgents.tex Fri Feb 06 21:09:21 2009 +0100 @@ -7,7 +7,9 @@ \section{Types of MTAs} + ``Mail transfer agent'' is a term that covers a variety of programs. One thing is common to them: They transfer email from a sender to one or many recipients. +\index{mta!definition} This is how \person{Bryan Costales} defines an \MTA: @@ -26,6 +28,8 @@ \person{Dent} and \person{Hafiz} agree \cite[page 19]{dent04} \cite[pages 3-5]{hafiz05}. Common to all \MTA{}s is the transport of mail; this is the actual job. Besides this similarity, \MTA{}s can be very different. Some of them have \NAME{POP3} and/or \NAME{IMAP} servers included. Some can fetch mails through these protocols. Others have all features one can think of. And maybe there are some that do nothing else but transporting email. +\index{pop3} +\index{imap} Following is a classification of \MTA{}s into groups of similar programs, regarding what is viewable from the outside. @@ -34,6 +38,9 @@ \label{subsec:relay-only} Also called \name{forwarders}. This is the most simple kind of an \MTA. It transfers mail only to defined \name{smart hosts}\footnote{\name{smart host}s are mail servers that receive email and route it to the actual destination.}. Relay-only \MTA{}s do not receive mail from outside the system and they do not deliver locally. All they do is transfer mail to a specified smart host for further relay. +\index{forwarder} +\index{relay-only mta} +\index{smart host} Most \MTA{}s can be configured to act as such a \name{forwarder}. But this is usually an additional functionality. @@ -43,6 +50,8 @@ \subsubsection*{Groupware} +\index{groupware} + Normally the term ``groupware'' does not mean one single program, but a suite of programs. They build a framework which is then populated with various modules that provide the actual functionality. Modules for mail transfer, file storage, calendars, resource management, Instant Messaging, and more, are commonly available. These program suites are used if the main work to do is providing integrated communication facilities and team working support for a group of people. Mail transfer is only one part of the problem to solve. The most common scenario are companies. They use \name{groupware} to provide adequate services for their teams to work efficiently. But one may use \name{groupware} on the home server for the family members too. @@ -51,6 +60,8 @@ \subsubsection*{``Real'' MTAs} +\index{real mta} + There is a third type of \MTA{}s in between the minimalistic \name{relay-only} \MTA{}s and the feature loaded \name{groupware}. Those programs may be named ``real \MTA{}s'', or ``proper \MTA{}s'', though there is no common name. They are what is meant with the term ``mail transfer agent''---programs that transfer mail between hosts. Common to them is their focus on the email transfer, while they are able to act as smart hosts. Their variety ranges from ones mostly restricted to mail transfer (e.g.\ \qmail) to others having interfaces for adding further mail processing modules (e.g.\ \postfix). This group covers everything in between the other two groups. @@ -59,11 +70,14 @@ \subsubsection*{Other segmenting} + \MTA{}s can also be split in other ways. Due to \sendmail's significance in the early times of email, compatibility interfaces to \sendmail\ are important for Unix \MTA{}s. The reason is that many mail applications simply assume the \sendmail\ \MTA\ to be installed on the system. Being not \name{sendmail-compatible} may not matter for some fields of action, but makes the program ineligible for serving as a general purpose \MTA\ on \unix\ systems. Hence being sendmail-compatible is a major property of an \MTA. \MTA{}s without \name{sendmail-compatible} interfaces, or at least compatibility add-ons, will not be covered here. One example for such a program is \name{Apache James}. %FIXME: check if correct +\index{sendmail!compatibility} Another separation can be done between Free Software \MTA{}s and proprietary ones. Many of the \MTA{}s for Unix systems are Free Software. Only these are regarded throughout this thesis, because comparing Free Software with proprietary or commercial software is not what typical users of programs like \masqmail\ do. Comparison with non-free programs may be a point for large Free Software projects that try to step into the business world. Small projects, mostly used by individuals at home, need to be compared against other projects of similar shape. The document is seen from \masqmail's point of view---an \MTA\ for Unix systems on home servers and workstations---so non-free software is out of the way. +\index{freesw} @@ -71,6 +85,7 @@ \subsubsection*{\masqmail's position} +\index{masqmail!position of} Now, where does \masqmail\ fit in? It is not groupware nor a simple forwarder, thus it belongs to the ``real \MTA{}s''. Additionally, it is Free Software and is sendmail-compatible to a large degree. This makes it similar to \sendmail, \exim, \qmail, and \postfix. \masqmail\ is intended to be a replacement for those \MTA{}s. @@ -94,6 +109,7 @@ \subsection{Market share analysis} \label{sec:market-share} +\index{mta!market share analysis} \MTA\ statistics are rare, differ, and good data is hard to collect. These points are bad if good statistics are wanted. Thus it is obvious there are only few available. @@ -104,6 +120,7 @@ \input{tbl/mta-market-share.tbl} \end{center} \caption{Market share of \MTA{}s} + \index{table!Market share of \MTA{}s} \label{tab:mta-market-share} \end{table} @@ -114,6 +131,7 @@ All surveys show \sendmail\ to be the most popular \MTA. \postfix, \qmail, and \exim\ are among the top six in each. \exim\ has slightly smaller shares than the other two. The four programs together share more than half of the market according to \person{Bernstein} and the \name{MailRadar} statistics. \name{O'ReillyNet} has their share to be somewhere between a third and the half. This uncertainty comes from the large amount of unidentifiable \MTA{}s. The 22 percent of \name{mail security layers} in the \name{O'ReillyNet} survey is remarkable. Mail security layers are software guards between the network and the \MTA\ that filter unwanted mail before it reaches the \MTA. This increases security by filtering malicious content and by blocking attacks against the \MTA. The large share here may be a result of only regarding business mail servers. The problem concerning the survey is the disguise of the \MTA{}s that run behind the security layer. It seems wrong to assume equal shares for the \MTA{}s behind the guards as for the unguarded \MTA{}s, because mail security layers will be more often used to guard weak \MTA{}s, as strong ones do not need them so much. This needs to be kept in mind when looking at the \name{O'ReillyNet} survey. +\index{mail security layer} The date of the \name{Mailradar} statistics is not known; a mail to \name{Mailradar} with a request for information has not been replied, unfortunately. However, it seems quite sure that the statistics were published after 2001, caused by the \sendmail\ and \postfix\ shares. But to decide whether before or after the one from \name{O'ReillyNet} would be just guessing. Possibly it receives constant input and thus displays a current state. @@ -126,6 +144,7 @@ \subsubsection*{sendmail} \label{sec:sendmail} +\index{sendmail} \sendmail\ is the best known \MTA, since it was one of the first and surely the one that made \MTA{}s popular. It also was shipped as default \MTA{}s by many Unix system vendors \citeweb{wikipedia:sendmail}. @@ -134,44 +153,57 @@ \sendmail\ is designed to transfer mails between different protocols and networks, this lead to a very flexible, though complex, configuration. The program was first released with \NAME{BSD} 4.1c in 1983. The latest version is 8.14.3 from May 2008. The program is distributed under the \name{Sendmail License} as both, free and proprietary software. +\index{bsd} %fixme: write about its importance and about sendmail-compat Further development will go into the project \name{MeTA1} which succeeds \sendmail. The former name of this new project was \name{sendmail~X}. +\index{meta1} +\index{sendmailx} More information can be found on the \sendmail\ homepage \citeweb{sendmail:homepage} and in the, so called, \name{Bat Book} \cite{costales97}. +\index{sendmail!homepage} \subsubsection*{exim} \label{sec:exim} +\index{exim} \exim\ was started in 1995 by \person{Philip Hazel} at the \name{University of Cambridge}. It is a fork of \name{smail-3}, and inherited the monolithic architecture which is similar to \sendmail's. But having no architecture-given separation of the individual components of the system did not hurt. Its security is quite good \cite{blanco05}. \exim\ is highly configurable, especially in the field of mail policies. This makes it easy to specify how mail is routed through the system and who is allowed to send email to whom. Interfaces to integrate spam and malware checkers are provided by design too. -The program is \freesw, released under the \NAME{GPL}. The latest stable version is 4.69 from December 2007. +The program is Free Software, released under the \NAME{GPL}. The latest stable version is 4.69 from December 2007. +\index{gpl} One finds \exim\ on its homepage \citeweb{exim:homepage}. The standard literature is \person{Hazel}'s \exim\ book \cite{hazel01}. +\index{exim!homepage} \subsubsection*{qmail} \label{sec:qmail} +\index{qmail} \qmail\ is seen by its community as ``a modern \SMTP\ server which makes sendmail obsolete'' \citeweb{qmail:homepage2}. It was written by \person{Daniel~J.\ Bernstein}, starting in 1995. His primary goal was to create a secure \MTA\ to replace the popular, but vulnerable, \sendmail. His own words are: ``This is why I started writing qmail: I was sick of the security holes in sendmail and other \MTA{}s.'' \citeweb{qmail:homepage1}. \qmail\ first introduced many innovative concepts in \MTA\ design. The most obvious contrast to \sendmail\ and \exim\ is its modular design. But \qmail\ was not the first modular \MTA. \NAME{MMDF}, which predates even \sendmail, was modular too. Regardless of \NAME{MMDF}'s modular architecture, \qmail\ is generally seen as the first security-aware \MTA\ \citeweb{wikipedia:qmail}. The latest release of \qmail\ is version 1.03 from July 1998. Afterwards, in November 2007, \qmail's source was put into the \name{public domain}. This made it Free Software. +\index{public domain} Because of \person{Bernstein}'s inactivity, though the requirements changed since 1998, ``[a] motley krewe of qmail contributors (see the \NAME{README}) has put together a netqmail-1.06 distribution of qmail. It is derived from Daniel Bernstein's qmail-1.03 plus bug fixes, a few feature enhancements, and some documentation.'' \citeweb{netqmail:homepage}. +\index{netqmail} \qmail's homepages are \citeweb{qmail:homepage1} and \citeweb{qmail:homepage2}. The best book about \qmail, from \person{Bernstein}'s view, is \person{Dave Sill}'s handbook \cite{sill02}. His free available guide ``Life with qmail'' is another valuable source \cite{lifewithqmail}. +\index{qmail!homepage} \subsubsection*{postfix} \label{sec:postfix} +\index{postfix} + The \postfix\ project started in 1999 at \NAME{IBM} \name{research}, then called \name{VMailer} or \NAME{IBM} \name{Secure Mailer}. \person{Wietse Venema}'s program ``attempts to be fast, easy to administer, and secure. The outside has a definite Sendmail-ish flavor, but the inside is completely different.'' \citeweb{postfix:homepage}. In fact, \postfix\ was mainly designed after qmail's architecture to gain security. But in contrast to \qmail\ it aims much more on being fast and full-featured. Today \postfix\ is taken by many \unix\ systems and \gnulinux\ distributions as default \MTA. @@ -179,6 +211,7 @@ The latest stable version is numbered 2.5.6 from December 2008. \postfix\ is covered by the \NAME{IBM} \name{Public License 1.0} which is a Free Software license. Additional information can be retrieved from the program's homepage \citeweb{postfix:homepage}. \person{Dent}'s \postfix\ book \cite{dent04} claims to be ``the definitive guide'', and it is. +\index{postfix!homepage} @@ -187,6 +220,7 @@ \section{Comparison of MTAs} \label{sec:mta-comparison} +\index{mta!comparison} This section does not try to provide a throughout \MTA\ comparison, because this is already done by others. Remarkable comparisons are the one by \person{Dan Shearer} \cite{shearer06} and a discussion on the mailing list \name{plug@lists.q-linux.com} \cite{plug:mtas}. Tabular overviews may be found at \citeweb{mailsoftware42}, \citeweb{wikipedia:comparison-of-mail-servers}, and \cite[section 1.9]{lifewithqmail}. @@ -197,11 +231,13 @@ \input{tbl/mta-comparison.tbl} \end{center} \caption{Comparison of \MTA{}s} + \index{table!Comparison of \MTA{}s} \label{tab:mta-comparison} \end{table} \subsubsection*{Architecture} +\index{mta!architecture} Architecture is most important when comparing \MTA{}s. Many other properties of a program depend on its architecture. \person{Munawar Hafiz} discusses in detail on \MTA\ architecture, comparing \sendmail, \qmail, \postfix, and \name{sendmail~X} \cite{hafiz05}. \person{Jonathan de Boyne Pollard}'s \MTA\ review \cite{jdebp} is a source too. @@ -212,6 +248,7 @@ Modular \MTA{}s are \NAME{MMDF}, \qmail, \postfix, and \name{MeTA1}. They consist of several programs, each doing only a part of the overall job. The different programs run with the least permissions they need, \emph{setuid root} can be avoided completely. The architecture does not directly define the program's security, but ``[t]he goal of making a software secure can be better achieved by making the design simple and easier to understand and verify'' \cite[chapter~6]{hafiz05}. \exim, though being monolithic, has a fairly clean security record. But it is very hard to keep the security up as the program growth. \person{Wietse Venema} (the author of \postfix) says, it was the architecture that enabled \postfix\ to grow without running into security problems \cite[page 13]{venema:postfix-growth}. +\index{security} The modular design, with each sub-program doing one part of the overall job, conforms to the \name{Unix Philosophy}. The Unix Philosophy \cite{gancarz95} demands ``small is beautiful'' and ``make each program do one thing well''. Monolithic \MTA{}s fail here. @@ -219,23 +256,28 @@ \subsubsection*{Spam checking and content processing} +\index{spam} Spam and malware increased during the last years. Today it is important for an \MTA\ to be able to provide checking for bad mail. This can be done by implementing functionality into the \MTA\ or by invoking external programs to do this job. \sendmail\ invented \name{milter}\footnote{``milter'' is a common abbreviation for ``sendmail mail filter \NAME{API}''.}, which is used to interface external programs of various kind. \postfix\ adopted the \name{milter} interface but is also able to easily include scanning modules into its modular structure. \qmail\ is pretty old and did not evolve with the changing market situation. Anyhow, its modular structure enables external scanners to be included into \qmail. \exim\ has the advantage that it was designed with the goal to provide extensive scanning facilities; it is therefore very good suited to scan itself or invoke external scanners. +\index{milter} \subsubsection*{Future trends} In chapter~\ref{chap:market-analysis}, it was tried to figure out trends and future requirements for \MTA{}s. The four programs are compared on these possible future requirements now. +\index{email!trends} \paragraph{Provider independence} The first trend was provider independence, which requires easy configuration. \postfix\ seems to do best here. It uses primary two configuration files (\path{master.cf} and \path{main.cf}) which are easy to manage. \sendmail\ appears to have a bad position. Its configuration file \path{sendmail.cf} is cryptic and very complex (it has legendary Turing-completeness) thus it needs simplification wrappers around it to provide easier configuration. They exist in form of the \name{m4} macros that generate the \path{sendmail.cf} file. Unfortunately, adjusting the generated result by hand appears to be necessary for non-trivial configurations. \qmail's configuration files are simple but the whole system is complex to set up; it requires various system users and \qmail\ is hardly usable without applying several patches that add functionality which is required nowadays. \name{netqmail} is the community's effort to help in the latter point. \exim\ has only one single configuration file (\path{exim.conf}) which suffers most from its flexibility---like in \sendmail's case. Flexibility and easy configuration are almost always contrary goals. \paragraph{Performance} +\index{performance} As second trend, the decreasing necessity for high performance was identified. This goes along with the move of \MTA{}s from service providers to home servers. \postfix\ focuses much on performance, this might not be an important point in the future. Of course there will still be the need for high performance \MTA{}s, but a growing share of the market will not require high performance. Energy and space efficiency is related to performance; it is a similar goal in a different direction. But optimization, be it for performance or other efficiencies, is often in contrast to simplicity and clarity; these two improve security. Optimizing does in most times decrease the simplicity and clarity. Simple \MTA{}s that do not aim for high performance are what is needed in future. The simple design of \qmail\footnote{\qmail\ is still fast} is a good example. \paragraph{Security} +\index{security} The third trend (even more security awareness) is addressed by each of the four programs. It seems as if all widely used \MTA{}s provide good security nowadays. Even \sendmail\ can be configured to be secure today. However, the modular architecture, used by \qmail\ and \postfix, is generally seen to be conceptually more secure. \sendmail's creators have started \name{MeTA1}, a modular \MTA\ that merges the best of \qmail\ and \postfix, to replace the old \sendmail. It will be interesting to watch \exim's future---will it become modular too? diff -r b4b06bc05059 -r 16d8eacf60e1 thesis/tex/4-MasqmailsFuture.tex --- a/thesis/tex/4-MasqmailsFuture.tex Fri Feb 06 21:08:49 2009 +0100 +++ b/thesis/tex/4-MasqmailsFuture.tex Fri Feb 06 21:09:21 2009 +0100 @@ -5,16 +5,21 @@ \section{The goal} +\index{development goal} Before requirements can be identified and further development can be discussed, it is important to clearly specify the goal to achieve. This means: What shall \masqmail\ be like in, for instance, five years? +\index{masqmail!in five years} Should \masqmail\ become more specific to a more narrow niche or rather become more general and move a bit out of its niche? Or should it even become a totally general \MTA\ like \sendmail, \exim, \qmail, and \postfix? Becoming completely general seems to be no choice because the competitors are too many and they are already too strong. It would require a strong base of developers and superior features to establish. There also seems to be no need for another general purpose \MTA\ additional to those four programs. Thus the effort would most likely remain a try. \person{Venema} stated: ``It is becoming less and less likely that someone will write another full-featured Postfix or Sendmail \MTA\ \emph{from scratch} (100 kloc).'' \cite{venema:postfix-growth}. At least \masqmail\ is not going to try that. +\index{postfix!no second postfix} \masqmail\ was intended to be a small ``real'' \MTA\ which covers the niche of managing the relay over several smart hosts. Small and resource friendly software is still important for workstations, home servers, and especially for embedded computers. Other software that focuses on the same niche is not known. Dial-up connections have become rare but mobile computers that move between different networks are popular. So, the niche is still present. What has changed in general is the security that is needed for software. \person{Graff} and \person{van Wyk} describe the situation well: ``[I]n today's world, your software is likely to have to operate in a very hostile security environment.'' \cite[page~33]{graff03}. Additionally they say: ``By definition, mail software processes information from potentially untrusted sources. Therefore, mail software must be written with great care, even when it runs with user privileges and even when it does not talk directly to a network.'' \cite[page~90]{graff03}. As \masqmail\ is mail software and trusted environments become rare, it is best for \masqmail\ to become a secure \MTA. +\index{hostile environment} +\index{security} In summary, the goal for \masqmail\ is to stay in the current niche with respect to modern usage scenarios and to become a secure \MTA. @@ -32,6 +37,7 @@ \subsection{Functional requirements} \label{sec:functional-requirements} +\index{functional requirements} Functional requirements are about the function of the software. They define what the program can do and in what way. %fixme: add ref @@ -41,14 +47,20 @@ \paragraph{\RF1: Incoming and outgoing channels} \label{rf1} \sendmail-compatible \MTA{}s must support at least two incoming channels: mail submitted using the \path{sendmail} command, and mail received on a \NAME{TCP} port. Thus it is common to split the incoming channels into local and remote. This is done by \qmail\ and \postfix. The same way is \person{Hafiz}'s view \cite{hafiz05}. +\index{incoming channels} +\index{sendmail!command} \SMTP\ is the primary mail transport protocol today, but with the increasing need for new protocols (see section~\ref{sec:what-will-be-important}) in mind, support for more than just \SMTP\ is good to have. New protocols will show up; maybe multiple protocols need to be supported then. This would lead to multiple remote channels, one for each supported protocol as it was done in other \MTA{}s. Best would be interfaces to add further protocols as modules. +\index{smtp} Outgoing mail is commonly either sent using \SMTP, piped into local commands (for example \path{uucp}), or delivered locally by appending to a mailbox. Outgoing channels are similar for \qmail, \postfix, and \name{sendmail~X}: All of them have a module to send mail using \SMTP, and one for writing into a local mailbox. +\index{outgoing channels} +\index{uucp} %fixme: is the def of MTA: transfer between machines, or transfer between users? Local mail delivery is a job that uses root privilege to be able to switch to any user in order to write to his mailbox. It is possible to deliver without being root privilege, but delivery to user's home folders is not generally possible then. Thus even the modular \MTA{}s \qmail\ and \postfix\ use root privilege for this job. As mail delivery to local users is \emph{not} included in the basic job of an \MTA{} and introduces a lot of new complexity, why should the \MTA\ bother? In order to keep the system simple, reduce privilege, and to have programs that do one job well, the local delivery job should be handed over to a specialist: the \NAME{MDA}. \NAME{MDA}s know about the various mailbox formats and are aware of the problems of concurrent write access and the like. Hence passing the message, and the responsibility for it, over to an \NAME{MDA} seems to be best. +\index{local delivery} This means an outgoing connection that pipes mail into local commands is required. To other outgoing channels applies what was already said about incoming channels. @@ -57,6 +69,7 @@ \includegraphics[scale=0.75]{img/mta-channels.eps} \end{center} \caption{Required incoming and outgoing channels} + \index{figure!Required incoming and outgoing channels} \label{fig:mta-channels} \end{figure} @@ -69,16 +82,22 @@ \paragraph{\RF2: Mail queuing} \label{rf2} +\index{mail queue} Mail queuing removes the need to deliver instantly as a message is received. The queue provides fail-safe storage of mails until they are delivered. Mail queues are probably used in all \MTA{}s, even in some simple forwarders. The mail queue is essential for \masqmail, as \masqmail\ is intended for non-permanent online connections. This means, mail must be queued until a online connection is available to send the message. This may be after a reboot. Hence the mail queue must provide persistence. +\index{forwarder} +\index{non-permanent} The mail queue and the module(s) to manage it are the central part of the whole system. This demands especially for robustness and reliability, as a failure here can lead to mail loss. An \MTA\ takes over responsibility for mail by accepting it, hence loosing mail messages is absolutely to avoid. This covers any kind of crash situation too. The worst thing acceptable to happen is an already sent mail to be sent again. +\index{reliability} \paragraph{\RF3: Header sanitizing} \label{rf3} +\index{header sanitizing} Mail coming into the system often lacks important header lines. At least the required ones must be added by the \MTA. One example is the \texttt{Date:} header, another is the, not required but recommended, \texttt{Message-ID:} header. Apart from adding missing headers, rewriting headers is important too. Changing the locally known domain part of email addresses to globally known ones is an example. \masqmail\ needs to be able to rewrite the domain part dependent on the route used to send the message, to prevent messages to get classified as spam. +\index{masqmail!online routes} Generating the envelope is a related job. The envelope specifies the actual recipient of the mail, no matter what the \texttt{To:}, \texttt{Cc:}, and \texttt{Bcc:} headers contain. Multiple recipients lead to multiple different envelopes, all containing the same mail message. @@ -87,6 +106,7 @@ \paragraph{\RF4: Aliasing} \label{rf4} +\index{aliases} Email addresses can have aliases, thus they need to be expanded. Aliases can be of different kind: another local user, a remote user, a list of local and remote users, or a command. Most important are the aliases in the \path{aliases} file, usually located at \path{/etc/aliases}. Addresses expanding to lists of users lead to more envelopes. Aliases changing the recipient's domain part may require a different route to be used. @@ -94,6 +114,7 @@ \paragraph{\RF5: Route management} \label{rf5} +\index{online routes} One key feature of \masqmail\ is its ability to send mail out over different routes. The online state defines the active route to be used. A specific route may not be suited for all messages, thus these messages are hold back until a suiting route is active. For more information on this concept see section~\ref{sec:masqmail-routes}. @@ -102,21 +123,29 @@ \paragraph{\RF6: Authentication} \label{rf6} \label{requirement-authentication} +\index{auth} One thing to avoid is being an \name{open relay}. Open relays allow to relay mail from everywhere to everywhere. This is a source of spam. The solution is restricting relay\footnote{Relaying is passing mail, that is not from and not for the own system, through it.} access. It may also be wanted to refuse all connections to the \MTA\ except ones from a specific set of hosts. +\index{open relay} +\index{spam} Several ways to restrict access are available. The most simple one is restriction by the \NAME{IP} address. No extra complexity is added this way but the \NAME{IP} addresses need to be static or within known ranges. This approach is often used to allow relaying for local nets. The access check can be done by the \MTA\ or by a guard (e.g.\ \NAME{TCP} \name{Wrappers}) before. The main advantage here is the minimal setup and maintenance work needed. This kind of access restriction is important to be implemented. +\index{access restriction} This authentication based on \NAME{IP} addresses is impossible in situations where hosts with changing \NAME{IP} addresses, that are not part of a known sub net, need access. Then a authentication mechanism based on some \emph{secret} is required. Three common approaches exist: \begin{enumerate} \item \SMTP-after-\NAME{POP}: Uses authentication on the \NAME{POP} protocol to permit incoming \SMTP\ connections for a limited time afterwards. The variant \SMTP-after-\NAME{IMAP} exists too. +\index{auth!smtp-after-pop} \item \SMTP\ authentication: An extension to \SMTP. It allows to request authentication before mail is accepted. Here no helper protocols are needed. +\index{auth!smtp-auth} \item Certificates: The identity of a user or a host is confirmed by certificates that are signed by trusted authorities. Certificates are closely related to encryption, they do normally satisfy both needs: encrypt the data transmission and identify the remote user/host. +\index{certificates} \end{enumerate} Static authentication is the preferred type for authenticating clients. It should be chosen if possible. This means if the \MTA\ resides within a trusted network or it is possible to define trusted network segments on basis of \NAME{IP} addresses, then static authentication is the best choice. If the \MTA\ does its job in an untrusted network, if it can be expected that forged \NAME{IP} addresses will appear, or if mobile clients need access, then dynamic authentication should be used. +\index{untrusted environment} Any combination is possible too. For example, it is preferred to allow relay access only to authenticated users. Either clients in local networks which are authenticated by their \NAME{IP} addresses or remote clients that authenticate by a secret-based method. @@ -127,21 +156,30 @@ \paragraph{\RF7: Encryption} \label{rf7} \label{requirement-encryption} +\index{enc} Electronic mail is vulnerable to sniffing attacks, because in generic \SMTP\ all data transfer is unencrypted. The message's body, the header, and the envelope are all unencrypted. But also some authentication dialogs transfer plain text passwords (e.g.\ \NAME{PLAIN} and \NAME{LOGIN}). Hence encryption is throughout important. +\index{auth} The common way to encrypt \SMTP\ dialogs is using \name{Transport Layer Security} (short: \TLS, the successor of \NAME{SSL}). \TLS\ encrypts the datagrams of the \name{transport layer}. This means it works below the application protocols and can be used with any of them \citeweb{wikipedia:tls}. +\index{tls} +\index{ssl} Using secure tunnels that are provided by external programs should be preferred over including encryption into the application, because the application needs not to bother with encryption then. Outgoing \SMTP\ connections can get encrypted using a secure tunnel, created by an external application (like \name{openssl}). But incoming connections can not use external secure tunnels, because the remote \NAME{IP} address is hidden then; all connections would appear to come from localhost instead. Figure~\ref{fig:stunnel} depicts the situation of using an application like \name{stunnel} for incoming connections. The connection to port 25 comes from localhost and this information reaches the \MTA. Authentication based on \NAME{IP} addresses and many spam prevention methods are useless then. +\index{secure tunnel} +\index{stunnel} \begin{figure} \begin{center} \includegraphics[scale=0.75]{img/stunnel.eps} \end{center} \caption{Using \name{stunnel} for incoming connections} + \index{figure!Using \name{stunnel} for incoming connections} \label{fig:stunnel} \end{figure} To provide encrypted incoming channels, the \MTA\ could implement encryption and listen on a port that is dedicated to encrypted \SMTP\ (\NAME{SMTPS}). This approach would be possible, but it is deprecated in favor for \NAME{STARTTLS}. \RFC\,3207 ``\SMTP\ Service Extension for Secure \SMTP\ over Transport Layer Security'' shows this by not mentioning \NAME{SMTPS} on port 465. Also port 465 is not even reserved for \NAME{SMTPS} anymore \citeweb{iana:port-numbers}. +\index{smtps} +\index{starttls} \NAME{STARTTLS}---defined in \RFC\,2487---is what \RFC\,3207 recommends to use for secure \SMTP. The connection then goes over port 25, but gets encrypted when the \NAME{STARTTLS} keyword is issued. Email depends on compatibility---only encryption methods that client and server support can be used. Hence it is best to act after the recommendations of the \RFC\ documents. This means \NAME{STARTTLS} encryption should be supported for incoming and for outgoing connections. @@ -149,15 +187,23 @@ \paragraph{\RF8: Spam handling} \label{rf8} +\index{spam} Spam is a major threat nowadays, but it is a war that is hard to win. The goal is to provide state-of-the-art spam protection, but not more. (See section~\ref{sec:swot-analysis}.) As spam is, by increasing the amount of mail messages, not just a nuisance for end users but also for the infrastructure---the \MTA{}s---they need to protect themselves. Filtering spam can be done by either refusing it during the \SMTP\ dialog or by checking for spam after the mail was accepted and queued. Both ways have advantages and disadvantages, so modern \MTA{}s use them in combination. +\index{smtp!dialog} Spam is usually identified by the results of a set of checks. Static rules, database querying (e.g.\ \NAME{DNS} blacklists \cite{cole07} \cite{levine08}), requesting special client behavior (e.g.\ \name{greylisting} \cite{harris03}, \name{hashcash} \cite{back02}), or statistical analysis (e.g.\ \name{bayesian filters} \cite{graham02}) are checks that may be used. Running more checks leads to better results, but takes more system resources and more time. +\index{dns blacklist} +\index{greylisting} +\index{hashcash} +\index{bayesian filter} Doing some basic checks during the \SMTP\ dialog seems to be a must \cite[page~25]{eisentraut05}. Including these checks into the \MTA\ makes them fast to avoid \SMTP\ dialog timeouts. For modularity and reusability reasons internal interfaces to specialized modules seem to be best. \person{Raymond} says: ``Modularity (simple parts, clean interfaces) is a way to organize programs to make them simpler.'' \cite[chapter~1]{raymond03}. +\index{smtp!dialog} +\index{modularity} More detailed checks after the message is queued should be done by external scanners. Interfaces to invoke them need to be defined. (See also the remarks about \name{amavis} in the next section.) @@ -167,9 +213,11 @@ \paragraph{\RF9: Malware handling} \label{rf9} +\index{malware} Related to spam is malicious content (short: \name{malware}) like viruses, worms, and trojan horses. They, in contrast to spam, do not affect the \MTA\ itself, as they are in the mail's body. \MTA{}s that search for malware are equal to post offices that open letters to check if they contain something that could harm the recipient. This is not a mail transport job. But by many people the \MTA\ which is responsible for the recipient is seen to be at a good position to do this work, thus it is often done there. Though, it is nice to have interfaces to such scanners within the \MTA. In any way should malware checking be performed by external programs that may be invoked by the \MTA. However, \NAME{MDA}s are better points to invoke content scanners. +\index{content scanner} A popular email filter framework is \name{amavis} which integrates various spam and malware scanners. The common setup includes a receiving \MTA\ which sends mail to \name{amavis} using \SMTP, \name{amavis} processes the mail and sends it then to a second \MTA\ that does the outgoing transfer. (This setup with two \MTA\ instances is discussed in more detail in section~\ref{sec:current-code-security}.) @@ -177,11 +225,13 @@ \paragraph{\RF10: Archiving} \label{rf10} +\index{archiving} Mail archiving and auditability become more important as email establishes as technology for serious business communication. Archiving is a must for companies in many countries. In the United States, the \name{Sarbanes-Oxley Act} \cite{sox} covers this topic. It is a goal to have the ability to archive verbatim copies of every mail coming into and every mail going out of the system, with relation between them. \postfix\ for example has a \name{always\_bcc} feature, to send a copy of every outgoing mail to a definable recipient. At least this functionality should be given, although a more complete approach, like \qmail\ provides, is preferable. \qmail\ is able to save copies of all sent and received messages and additionally complete \SMTP\ dialogs \cite[page~12]{sill02}. +\index{smtp!dialog} But if archiving is of high importance, a dedicated archiving solution is advisable, anyway. @@ -189,6 +239,7 @@ \subsection{Non-functional requirements} +\index{non-functional requirement} Now follows a list of non-functional requirements for \masqmail. These requirements specify the quality properties of a software. The list is based on \person{Hafiz} \cite[page~2]{hafiz05}, with inspiration from \person{Spinellis} \cite[page~6]{spinellis06} and \person{Kan} \cite{kan03}. %fixme: refer to ch01 and ch02 @@ -196,36 +247,47 @@ \paragraph{\RG1: Security} +\index{security} \MTA{}s are critical points for computer security as they are accessible from external networks. They must be secured with high effort. Properties like the need for high privilege level, from outside influenced work load, work on unsafe data, and demand for reliability, increase the need for security. This is best done by modularization, also called \name{compartmentalization}, as described in section~\ref{sec:discussion-mta-arch}. +\index{compartmentalization} \masqmail\ needs to be secure enough for its target field of operation. \masqmail\ is targeted to workstations and private networks, with explicit warning to not use it on permanent online hosts \citeweb{masqmail:homepage2}. But as non-permanent online connections and trustable environments become rare, \masqmail's security should be so good that it is usable with permanent online connections and in unsafe environments. For example should mails with bad content not be able to break \masqmail. +\index{masqmail!security} \paragraph{\RG2: Reliability} +\index{reliability} Reliability is the second essential quality property for an \MTA. Mail for which the \MTA\ took responsibility must never get lost while it is within the \MTA's responsibility. The \MTA\ must not be \emph{the cause} of any mail loss, no matter what happens. Unreliable \MTA{}s are of no value. However, as the mail transport infrastructure is a distributed system, one of the communication partners or the transport medium may crash at any time during mail transfer. Thus reliability is needed for mail transfer communication too. +\index{mail loss} The goal is to transfer exactly one copy of the message. \person{Tanenbaum} evaluates the situation and comes to the conclusion that ``in general, there is no way to arrange this.'' \cite[pages~377--379]{tanenbaum02}. Only strategies where no mail gets lost are acceptable; he identifies three of them, but one generates more duplicates than the others, so two strategies remain. (1) The client always reissues the transfer. The server first sends an acknowledgment and then handles the transfer. (2) The client reissues the transfer only if no acknowledgment was received. The server first handles the transfer and sends the acknowledgment afterwards. The first strategy does not need acknowledgments at all, however, it will lose mail if the second transfer fails too. Hence, mail transfer between two processes should use the strategy: The client reissues if it receives no acknowledgment. The server first handles the message and then sends the acknowledgment. This strategy only leads to duplicates if a crash happens in the time between the message is fully transferred to the server and the acknowledgment is received by the client. No mail will get lost. +\index{duplicates} \paragraph{\RG3: Robustness} +\index{robustness} Being robust means handling errors properly. Small errors may get corrected, large errors may kill a process. Killed processes should get restarted automatically and lead to a clean state again. Log messages should be written in every case. Robust software does not need a special environment, it creates a friendly environment itself. \person{Raymond}'s \name{Rule of Robustness} and his \name{Rule of Repair} are good descriptions \cite[pages~18--21]{raymond03}. \paragraph{\RG4: Extendability} +\index{extendability} \masqmail's architecture needs to be extendable to allow new features to be added afterwards. The reasons for this need are the changing requirements. New requirements will appear, like more efficient mail transfer of large messages or a final solution for spam problem. Extendability is the ability of software to include new function with little work. \paragraph{\RG5: Maintainability} +\index{maintainability} Maintaining software takes much time and effort. \person{Spinellis} guesses ``40\,\% to 70\,\% of the effort that goes into a software system is expended after the system is written first time.'' \cite[page~1]{spinellis03}. This work is called \emph{maintaining}. Hence making software good to maintain will ease all further work. \paragraph{\RG6: Testability} +\index{testability} Good testability make maintenance easier too, because functionality is directly verifiable when changes are done, thus removing the uncertainty. Modularized software makes testing easier, because parts can be tested without external influences. \person{Spinellis} sees testability as a sub-quality of maintainability. \paragraph{\RG7: Performance} +\index{performance} Also called ``efficiency''. Efficient software requires few time and few resources. The merge of communication hardware and its move from service providers to homes and to mobile devices demand smaller and more resource-friendly software. The amount of mail will be lower even if much more mail will be sent, thus time performance is less important. \masqmail\ is not a program to be used on large servers, but on small devices. Thus more important for \masqmail\ will be energy and heat saving, maybe also system resources. As performance improvements are in contrast to many other quality properties (reliability, maintainability, usability, capability \cite[page~5]{kan03}), jeopardizing these to gain some more performance should not be done. \person{Kernighan} and \person{Pike} state clear: ``[T]he first principle of optimization is \emph{don't}.'' \cite[page~165]{kernighan99}. Simplicity and clearness are of higher value. @@ -233,15 +295,18 @@ \paragraph{\RG8: Availability} +\index{availability} Availability is important for server programs. They must stay operational by blocking \name{denial of service} attacks and the like. Automated restarts into a clean state after fatal errors are also required. \paragraph{\RG9: Portability} -Source code that compiles and runs on various operation systems is called portable. Portability can be achieved by using standard features of the programming language and common libraries. Basic rules to achieve portable code are defined by \person{Kernighan} and \person{Pike} \cite{kernighan99}. Portable code lets software spread faster. Portability among the various flavors of \unix\ systems is a goal for \masqmail, because these systems are the ones \MTA{}s usually run on. No special care needs to be taken for non-\unix\ platforms. +\index{portability} +Source code that compiles and runs on various operation systems is called portable. Portability can be achieved by using standard features of the programming language and common libraries. Basic rules to achieve portable code are defined by \person{Kernighan} and \person{Pike} \cite{kernighan99}. Portable code lets software spread faster. Portability among the various flavors of Unix systems is a goal for \masqmail, because these systems are the ones \MTA{}s usually run on. No special care needs to be taken for non-\unix\ platforms. \paragraph{\RG10: Usability} +\index{usability} Usability, not mentioned by \person{Hafiz} (he focuses on architecture) but by \person{Spinellis} and \person{Kan}, is a property which is very important from the user's point of view. Software with bad usability is rarely used, no matter how good it is. If substitutes with better usability exist, the user will switch to one of them. Here, usability includes setting up and configuring; the term ``users'' includes administrators. Having \MTA{}s on home servers and workstations requires easy and standardized configuration. The common setups should be configurable with little action by the user. Complex configuration should be possible, but the focus should be on the most common form of configuration: choosing one of several common setups. %fixme: << masqmail as portable app? >> @@ -250,13 +315,18 @@ \subsection{Architecture} \label{sec:discussion-mta-arch} +\index{architecture} %fixme: what's this section to do with requirements? \masqmail's current architecture is monolithic like \sendmail's and \exim's. But more than the other two is it one block of interweaved code. \exim\ has a highly structured code with many internal interfaces, a good example is the interface for authentication ``modules''. %fixme: add ref \sendmail\ provides now, with its \name{milter} interface, standardized connection channels to external modules. \masqmail\ has none of them---it is what \sendmail\ was in the beginning: a single large block. +\index{milter} +\index{masqmail!architecture} Figure~\ref{fig:masqmail-arch} is a call graph generated from \masqmail's source code. It gives an impression of how interweaved the internals are. There are no compartments at all. +\index{masqmail!call graph} +\index{call graph} \begin{figure} \begin{center} @@ -266,14 +336,18 @@ \includegraphics[scale=0.75]{img/bb.eps} \end{center} \caption{Internal structure of \masqmail, showed by a call graph. (Logging functions are ignored; test and \NAME{POP3} code is excluded.)} + \index{figure!Internal structure of \masqmail.} \label{fig:masqmail-arch} \end{figure} \sendmail\ improved its old architecture by adding the milter interface, to include further functionality by invoking external programs. \exim\ was designed, and is carefully maintained, with a modular-like code structure in mind. \qmail\ started from scratch with a ``security-first'' approach, \postfix\ improved on it, and \name{sendmail~X}/\name{MeTA1} tries to adopt the best of \qmail\ and \postfix\ to completely replace the old \sendmail\ architecture. \person{Hafiz} describes this evolution of \MTA\ architecture very well \cite{hafiz05}. +\index{security} Every one of these programs is more modular, or became more modular over time, than \masqmail\ is. Modern requirements like spam protection and probable future requirements like the use of new mail transport protocols demand for modular designs in order to keep the software simple. Simplicity is a key property for security. ``[T]he essence of security engineering is to build systems that are as simple as possible.'' \cite[page 45]{graff03}. +\index{modularity} \person{Hafiz} agrees: ``The goal of making software secure can be better achieved by making the design simple and easier to understand and verify.'' \cite[page 64]{hafiz05}. He identifies the security of \qmail\ to come from it's \name{compartmentalization}, which goes hand in hand with modularity: +\index{compartmentalization} \begin{quote} A perfect example is the contrast between the feature envy early \sendmail\ architecture implemented as one process and the simple, modular architecture of \qmail. The security of \qmail\ comes from its compartmentalized simple processes that perform one task only and are therefore testable for security. @@ -281,10 +355,12 @@ \end{quote} Equal does \person{Dent} see the situation for \postfix: ``The modular architecture of Postfix forms the basis for much of its security.'' \cite[page 7]{dent04}. +\index{modularity} Modularity is also needed to satisfy modern \MTA\ requirements in providing a clear interface to add functionality without increasing the overall complexity much. Modularity is no direct requirement but a goal that has positive influence on important requirements like security, testability, extendability, maintainability, and not least simplicity. These quality properties then, on their part, make it easier to achieve the functional requirements. +\index{security} Hence, aspiration for modularity, by compartmentalization, improves the overall quality and function of the software. It can be seen as an architectural requirement for a secure and modern \MTA. @@ -305,46 +381,65 @@ \paragraph{\RF1: In/out channels} +\index{incoming channels} +\index{outgoing channels} The incoming and outgoing channels that \masqmail\ already has (depicted in figure~\ref{fig:masqmail-channels} on page \pageref{fig:masqmail-channels}) are the ones required for an \MTA{}s at the moment. Currently, support for other protocols seems not to be necessary, although new protocols and mailing concepts are likely to appear (see section~\ref{sec:email-trends}). As other protocols are not required today, \masqmail\ is regarded to fulfill \RF1. Without any support in \masqmail\ for adding further protocols, the best strategy is to delaying such work until the functionality is essential, anyway. %fixme: << smtp submission >> %fixme \paragraph{\RF2: Queuing} +\index{mail queue} One single mail queue is used in \masqmail. It satisfies all current requirements. \paragraph{\RF3: Header sanitizing} +\index{header sanitizing} The envelope and mail headers are generated when the mail is put into the queue. The requirements are fulfilled. \paragraph{\RF4: Aliasing} +\index{aliases} Aliasing is done on delivery. All common kinds of aliases in the global aliases file are supported. So called \name{.forward} aliasing is not supported, but this is less common and seldom used. \paragraph{\RF5: Route management} +\index{online routes} Querying the name of the active route is done on delivery. Headers can get rewritten a second time then. This part does provide all the functionality required. \paragraph{\RF6: Authentication} +\index{auth} Static authentication, based on \NAME{IP} addresses, can be achieved with \person{Venema}'s \NAME{TCP} \name{Wrapper} \cite{venema92}, by editing the \path{hosts.allow} and \path{hosts.deny} files. This is only relevant to authenticate host that try to submit mail into the system. Dynamic (secret-based) \SMTP\ authentication is already supported in form of \NAME{SMTP-AUTH} and \SMTP-after-\NAME{POP}, but only for outgoing connections. For incoming connections only address-based authentication is supported. +\index{auth!smtp-after-pop} +\index{auth!smtp-auth} \paragraph{\RF7: Encryption} +\index{enc} Similar is the situation for encryption which is also only available for outgoing channels; here a tunnel application, like \name{openssl}, is needed. A secure tunnel can be created to send mail trough. State-of-the-art, however, is using \NAME{STARTTLS}, but this is not supported. For incoming channels, no encryption is available. The only possible setup to provide encryption of incoming channels is using an application like \name{stunnel} to crypt between the secure connection to the remote host and the plain connection to the \MTA. Unfortunately, this suffers from the problem explained on page \pageref{fig:stunnel} in figure~\ref{fig:stunnel}. Anyway, it would still be no \NAME{STARTTLS} support. +\index{secure tunnel} \paragraph{\RF8: Spam handling} +\index{spam!handling} \masqmail\ does not provide special support for spam filtering. Spam prevention by not accepting spam during the \SMTP\ dialog is not possible at all. Spam filtering is only possible by using two \masqmail\ instances with an external spam filter in between. The mail flow is from the receiving \MTA\ instance, which accepts mail, to the filter application that processes and possible modifies it, to the second \MTA\ which is responsible for further delivery of the mail. This is a concept that works in general, and it is good to separate different work with clear interfaces. But the need of two instances of the same \MTA, with doubled setup, makes it rather a work-around. Better is to have this data flow respected in the \MTA\ design, like it was done in \postfix. Anyway, the more important part of spam handling, for sure, is done during the \SMTP\ dialog by completely refusing unwanted mail. \paragraph{\RF9: Malware handling} +\index{malware!handling} For malware handling applies nearly the same as for spam handling, except that all checks are done after mail is accepted. The possible setup is the same with the two \MTA\ instances and the filter in between. \masqmail\ does support such a setup, but not in a nice way. \paragraph{\RF10: Archiving} +\index{archiving} There is currently no way for archiving every message that does through \masqmail. \paragraph{\RG1: Security} +\index{security} \masqmail's current security is bad. However, it seems acceptable for using \masqmail\ on workstations and private networks, if the environment is trustable and \masqmail\ is protected against remote attacks. In environments where untrusted components or persons have access to \masqmail, its security is too low. Its author states that \masqmail\ ``is not designed to'' such usage \citeweb{masqmail:homepage2}. This is a clear indicator for being careful. Issues like high memory consumption, low performance, and denial-of-service attacks---things not regarded by design---may cause serious problems. In any way, a security report that confirms \masqmail's security level is missing. +\index{masqmail!security} \masqmail\ uses conditional compilation to exclude unneeded functionality from the executable at compile time. Excluding code means excluding all bugs and weaknesses within this code too. Excluding unused code is a good concept to improve security. +\index{conditional compilation} \paragraph{\RG2: Reliability} +\index{reliability} Its reliability is also not good enough. Situations where only one part of a sent message was removed from the queue and the other part remained as garbage, showed off \citeweb{debian:bug245882}. Problems with large mail messages in conjunction with small bandwidth were also reported \citeweb{debian:bug216226}. Fortunately, lost email was no big problem yet, but \person{Kurth} warns: +\index{masqmail!bugs} \begin{quote} There may still be serious bugs in [masqmail], so mail might get lost. But in the nearly two years of its existence so far there was only one time a bug which caused mail retrieved via pop3 to be lost in rare circumstances. @@ -355,36 +450,47 @@ %fixme: state machine \paragraph{\RG3: Robustness} +\index{robustness} The logging behavior of \masqmail\ is good, although it does not cover the whole code. For example, if the queue directory is world writeable by accident (or as action of an intruder), any user can remove messages from the queue or replace them with own ones. \masqmail\ does not even write a debug message in this case. The origin of this problem, however, is \masqmail's trust in its environment. %fixme: rule of robustness, rule of repair \paragraph{\RG4: Extendability} +\index{extendability} \masqmail's extendability is very poor. This is a general problem of monolithic software, but can though be provided with high effort. \exim\ is an example for good extendability in a monolithic program. \paragraph{\RG5: Maintainability} +\index{maintainability} The maintainability of \masqmail\ is equivalent to other software of similar kind. Missing modularity and therefore more complexity makes the maintainer's work harder. Conditional compilation might be good for security, but \name{ifdef}s scattered throughout the source code is a pain for maintenance. In summary is \masqmail's maintainability bearable, like in average Free Software projects. \paragraph{\RG6: Testability} +\index{testability} The testability suffers from missing modularity, too. Testing program parts is hard to do. Nevertheless, it is done by compiling parts of the source to two special test programs: One tests reading input from a socket, the other tests constructing messages and sending it directly. Neither is designed for automated testing of source parts, they are rather to help the programmer during development. Two additional scripts exist to send a set of mails to differend kinds of recipients. They can be used for automated testing, but both check only the function of the whole system, not its parts. +\index{test program} %fixme: think about clean-room testing \paragraph{\RG7: Performance} +\index{performance} The performance---efficiency---of \masqmail\ is good enough for its target field of operation, where this is a minor goal. \paragraph{\RG8: Availability} +\index{availability} This applies equal to availability. Hence no further work needs to be done her. \paragraph{\RG9: Portability} -The code's portability is good with view on \unix-like operation systems. At least \name{Debian}, \name{Red Hat}, \NAME{SUSE}, \name{Slackware}, \name{Free}\NAME{BSD}, \name{Open}\NAME{BSD}, and \name{Net}\NAME{BSD} are reported to be able to compile and run \masqmail\ \citeweb{masqmail:homepage2}. Special requirements for the underlying file system are not known. Thus, the portability is already good. +\index{portability} +The code's portability is good with view on Unix-like operation systems. At least \name{Debian}, \name{Red Hat}, \NAME{SUSE}, \name{Slackware}, \name{Free}\NAME{BSD}, \name{Open}\NAME{BSD}, and \name{Net}\NAME{BSD} are reported to be able to compile and run \masqmail\ \citeweb{masqmail:homepage2}. Special requirements for the underlying file system are not known. Thus, the portability is already good. +\index{masqmail!supported systems} \paragraph{\RG10: Usability} +\index{usability} The usability is very good, from the administrator's point of view. \masqmail\ was developed to suite a specific, limited job---its configuration does perfect match. The user's view does not reach to the \MTA, as it is hidden behind the \MUA. Configuration could be eased even more by providing configuration generators that enable \masqmail\ to be used right ``out of the box'' after running one of several configuration scripts for common setups. This would improve \masqmail's usability for not technical educated people. +\index{out-of-the-box usage} @@ -399,38 +505,49 @@ \input{tbl/requirements.tbl} \end{center} \caption{Importance of and pending work for requirements} + \index{table!Importance of and pending work for requirements} \label{tab:requirements} \end{table} The importance is ranked from `-{}-' (not important) to `++' (very important). The pending work is ranked from `-{}-' (nothing) to `++' (very much). Large work tasks with high importance need to receive much attention, they need to be in focus. In contrast should small, low importance work tasks receive few attention. Here the focus for a task is calculated by summing up the importance and the pending work with equal weight. Normally, tasks with high focus are the ones of high priority and should be done first. The functional requirements that receive highest attention are \RF\,6 (authentication), \RF\,7 (encryption), and \RF\,8 (spam handling). Of the non-functional requirements, \RG\,1 (security), \RG\,2 (reliability), and \RG\,4 (extendability), rank highest. +\index{requirements!ranking} These tasks are presented in more detail in a todo list, now. The list is sorted by focus and then by importance. \subsubsection*{\TODO1: Encryption (\RF7)} +\index{enc} Encryption is chosen for number one as it is essential to provide privacy. Using \NAME{STARTTLS} for encryption is definitely needed and should be added first; encrypted data transfer is hardly possible without support for it. \subsubsection*{\TODO2: Authentication (\RF6)} +\index{auth} Authentication of incoming \SMTP\ connections is also highly needed and should be added second. It is important to restrict access and to prevent relaying. For workstations and local networks, this has only medium importance and address-based authentication is sufficient in most times. But secret-based authentication is mandatory to receive mail from the Internet. Additionally it is a guard against spam. \subsubsection*{\TODO3: Security (\RG1)} +\index{security} \masqmail's security is bad, thus the program is forced into a limited field of operation. This field of operation even shrinks as security becomes more important and networking and interaction increases. Secure and trusted environment become rare, thus improving security is an important thing to do. The focus should be on adding compartments to split \masqmail\ into separate modules. (See section~\ref{sec:discussion-mta-arch}.) Further more should \masqmail's security be tested throughout to get a definitive view how good it really is and where the weak spots are. +\index{modularity} \subsubsection*{\TODO4: Reliability (\RG2)} +\index{reliability} Reliability is also to improve. It is a key quality property for an \MTA, and not good enough in \masqmail. Reliability is strong related to the queue, thus improvements there are favorable. Applying ideas of \name{crash-only software} \cite{candea03} will be a good step. \person{Candea} and \person{Fox} see in killing the process the best way to stop a running program. Doing so inevitably demands for good reliability of the queue, and the start up process inevitably demands for good recovery. Those critical situations for reliability are nothing special anymore, they are common. Hence they are regularly tested and will definitely work. +\index{crash-only software} \subsubsection*{\TODO5: Spam handling (\RF8)} +\index{spam!handling} As authentication can be a guard against spam, filter facilities have lower priority. But basic spam filtering and interfaces for external tools should be implemented in future. Configuration guides for a setup of two \masqmail\ instances with a spam scanner in between should be written. And at least a basic kind of spam prevention during the \SMTP\ dialog should be implemented. \subsubsection*{\TODO6: Extendability (\RG4)} +\index{extendability} \masqmail\ lacks an interface to plug in modules with additional functionality. There exists no add-on or module system. The code is only separated by function into various source files. Some functional parts can be included or excluded by conditional compilation. But the \name{ifdef}s are scattered through all the code. This situation needs to be improved by collecting related function into single places that interact through clear interfaces with other parts. Also should these interfaces allow efficient adding of further functionality. +\index{conditional compilation} @@ -442,6 +559,7 @@ \section{Ways for further development} +\index{development strategies} Knowing what needs to be done is only one part, the other is deciding \emph{how} to do it by focusing on a global development strategy. @@ -450,12 +568,14 @@ Further development of software can always go three different ways: \begin{enumerate} -\item Improve the current code base. (S\,1) -\item Add wrappers or interposition filters. (S\,2) -\item Redesign the software from scratch and rebuild it. (S\,3) + \item Improve the current code base. (S\,1) + \item Add wrappers or interposition filters. (S\,2) + \item Redesign the software from scratch and rebuild it. (S\,3) \end{enumerate} The first two strategies base on the available source code and can be applied in combination. The third strategy splits from the old code base and starts over again. Wrappers and interposition filters would be outright included into a new architecture; they are a subset of a new design. Of course, parts of existing code can be used in a new design if appropriate. +\index{wrapper} +\index{interposition filter} The requirements are now regarded, each on its own, and are linked to the development strategy that is preferred to reach each specific requirement. If some requirement is well achievable by using different strategies then it is linked to all of them. Implementing encryption (\TODO1) and authentication (\TODO2), for example, are limited to a narrow region in the code. Such features are addable to the current code base without much problem. In contrast can quality properties like reliability (\TODO4), extendability (\TODO6), and maintainability hardly be added to code afterwards---if at all. Security (\TODO3) is improvable in a new design, of course, but also with wrappers or interposition filters. @@ -467,6 +587,7 @@ \input{tbl/strategies.tbl} \end{center} \caption{Development strategies and their suitability for requirements} + \index{figure!Development strategies and their suitability for requirements} \label{tab:strategies} \end{table} @@ -491,12 +612,15 @@ \subsubsection*{Quality improvements} +\index{quality improvement} Most quality properties can hardly be added afterwards. Hence, if reliability, extendability, or maintainability shall be improved, a redesign of \masqmail\ is the best way to take. The wish to improve quality inevitably point towards a modular architecture. Modularity with internal and external interfaces is highly preferred from the architectural point of view (see section~\ref{sec:discussion-mta-arch}). The need for further features, especially ones that require changes in \masqmail's structure, support the decision for a new design, too. Hence a rewrite is favored if \masqmail\ should become a modern \MTA\ with good quality properties. +\index{modularity} \subsubsection*{Security} +\index{security} Similar is the situation for security. Security comes from good design, explain \person{Graff} and \person{van Wyk}: @@ -506,23 +630,29 @@ %Bad design makes life easier for attackers and harder for the good guys, especially if it contributes to a false sends of security while obscuring pertinent failings. \hfill\cite[page 55]{graff03} \end{quote} +\index{good design} They also suggest to add wrappers and interposition filters \emph{around} applications, but more as repair techniques if it is not possible to design security \emph{into} a software the first way \cite[pages~71--72]{graff03}. +\index{wrapper} +\index{interposition filter} \person{Hafiz} adds: ``The major idea is that security cannot be retrofitted \emph{into} an architecture.'' \cite[page 64, (emphasis added)]{hafiz05}. +\index{security!retrofitted} \subsubsection*{Effort estimation} +\index{effort estimation} Although a strategy might lead to the best result, one may choose another one if the required effort is too high. The effort for a redesign and rebuild is estimated now. \person{Wheeler}'s program \name{sloccount} calculates following estimations for \masqmail's code base as of version 0.2.21 (excluding library code): +\index{masqmail!development effort} \codeinput{input/masqmail-sloccount.txt} -The development costs in money are not relevant for a \freesw\ project with volunteer developers, but the development time is. About 24 man-months are estimated. The current code base was written almost completely by \person{Oliver Kurth} within four years in his spare time. This means he needed around twice as much time. Of course, he programmed as a volunteer developer not as an employee with eight work-hours per day. +The development costs in money are not relevant for a Free Software project with volunteer developers, but the development time is. About 24 man-months are estimated. The current code base was written almost completely by \person{Oliver Kurth} within four years in his spare time. This means he needed around twice as much time. Of course, he programmed as a volunteer developer not as an employee with eight work-hours per day. Given the assumptions that (1) an equal amount of code needs to be produced for a new designed \masqmail, (2) a third of the existing code can be reused plus concepts and knowledge, and (3) development speed is like \person{Kurth}'s, then it would take between two and three years for one programmer to produce a redesigned new \masqmail\ with the same features that \masqmail\ now has. Less time would be needed if a simpler architecture allows faster development, better testing, and less bugs. Of course more developers would speed it up too. @@ -530,6 +660,7 @@ \subsubsection*{Risks} +\index{risks} The gained result of a new design might still outweigh the development effort. But risks are something more to consider. @@ -542,12 +673,15 @@ \subsubsection*{Existing code is precious} +\index{existing code} If a new design needs much effort and additionally is a risk, what about the existing code base then? Adding new functionality to an existing code base seems to be a secure and cheap strategy. The existing code is known to work and features can often be added in small increments. Risks like wasted effort if a new design fails are hardly existent, and the faults in the current design are already made and most probably fixed. Functionality that is hard to add incrementally into the application, like support for new protocols, may be addable to the outside. \masqmail\ can be secured to a huge amount by guarding it with wrappers that block attackers. Spam and malware scanners can be included by running two instances of \masqmail. All those methods base on the current code which they can indirectly improve. +\index{wrapper} +\index{extendability} The required effort is probably under one third of a new design and work directly shows results. These are strong arguments against a new design. @@ -555,6 +689,7 @@ \subsubsection*{Repairing} +\index{reparing} Besides these advantages of existing code, one must not forget that further work on it is often repair work. Small bug fixes are not the problem, but adding something for which the software originally was not designed will cause problems. Such work often destroys the clear concepts of the software, especially in interweaved monolithic code. @@ -563,34 +698,42 @@ Repair strategies are useful, but only in the short-time view and in times of trouble. If the future is bright, however, one does best by investing into a software. As shown in section~\ref{sec:market-analysis-conclusion}, the future for \MTA{}s is bright. This means it is time to invest into a redesign with the intension to build up a more modern product. In the author's view is \masqmail\ already needing this redesign since about 2003 when the old design was still quite suitable \dots\ it already delayed too long. +\index{masqmail!redesign} %Clinging to much to existing code will be no help, it is an indicator for fear. Having the courage to through bad code away to make it better, shows the view forward. Anyway, further development on base of current code needs to improve the quality properties too. Some quality requirements can be satisfied by adding wrappers or interposition filters to the outside. For those is the development effort approximately equal to a solution with a new design. But for adding quality requirements like extendability or maintainability which affect the source code throughout, the effort does increase with exponential rate as development proceeds. In case these properties get not improved, development will likely come to a dead end sooner or later. +\index{quality improvement} \subsubsection*{A guard against dead ends} +\index{dead ends} A new design does protect against such dead ends. Changing requirements are one possible dead end if the software does not evolve with them. A famous example is \sendmail, which had an almost monopoly for a long time. But when security became important, \sendmail\ was only repaired instead of the problem sources---its insecure design---would have been removed. Thus security problems reappeared and over the years \sendmail's market share shrank as more secure \MTA{}s became available. \sendmail's reaction to the new requirements, in form of \name{sendmail~X} and \name{MeTA1}, came much to late---the users already switched to other \MTA{}s. +\index{sendmail} Redesigning a software as requirements change helps keeping it alive. % fixme: add quote: ``one thing surely remains: change'' (something like that) +\index{redesign} Another danger is the dead end of complexity which is likely to appear by constant work on the same code base. It is even more likely if the code base has a monolithic architecture. A good example for simplicity is \qmail\ which consists of small independent modules, each with only about one thousand lines of code. Such simple code makes it obvious to understand what it does. The \name{suckless} project \citeweb{suckless.org} for example advertises such a philosophy of small and simple software by following the thoughts of the \unix\ inventors \cite{kernighan84} \cite{kernighan99}. Simple, small, and clear code avoids complexity and is thus also a strong prerequisite for security. +\index{suckless} \subsubsection*{Modularity} +\index{modularity} The avoidance of dead ends is essential for further development on current code too. Hence it is mandatory to refactor the existing code base sooner or later. Most important is the intention to modularize it, as modularity improves many quality properties, eases further development, and essentially improves security. One example how modular structure makes it easy to add further functionality is described by \person{Sill}: He says that integrating the \name{amavis} filter framework into the \qmail\ system can be done by simply renaming the \path{qmail-queue} module to \path{qmail-queue-real} and renaming the \path{amavis} executable to \path{qmail-queue} \cite[section~12.7.1]{sill02}. Nothing more in the \qmail\ system needs to be changed. This is a very admirable ability which is only possible in a modular system that consists of independent executables. +\index{modularity} This thesis showed several times that modularity is a key property for good software design. Modularity can hardly be retrofitted into software, hence development on base of current code will need a throughout restructuring too, to modularize the source code. Thus a new design is similar to such a throughout refactoring, except the dependence on current code. @@ -600,6 +743,7 @@ \subsubsection*{Function versus quality} +\index{quality improvement} Remarkable is the distribution of functional and non-functional requirements to the strategies. The strategies for current code (S\,1+2) have a functional to non-functional ratio of 10 to 3. The new design strategy (S\,3) has a ratio of 5 to 12. @@ -613,10 +757,12 @@ \subsubsection*{Break Even} +\index{Break Even} It is important to keep the time dimension in mind. This includes the separation into a short-time and a long-time view. The short-time view shall cover between two and four years, here. The long-time view is the following time. % fixme: find sources! In the short-time view, the effort for improving the existing code is much smaller than the effort for a new design plus improvements. But to have similar quality properties at the end of the short-time frame, a version that is based on current code will probably require nearly as much effort as a new designed version will take. For all further development afterwards, the new design will scale well while the old code will require exponential more work. +\index{existing code} In the long-time view, a restructuring for modularity is necessary anyway. The question is, when it should be done: Right at the start in a new design, or later as restructuring work. @@ -625,12 +771,15 @@ \subsubsection*{The problem with ``good enough''} +\index{good enough} The decision for later restructuring is problematic. Functionality is often more wanted than quality, thus more function is preferred over better quality, as quality is still ``good enough''. But it might be still ``good enough'' the next time, and the time after that one, and so on. Quality improvement is no popular work, but it is required to avoid dead ends. As more code increases the work that needs to be done for quality and modularity improvements, it is better to do these improvements early. Afterwards, all further development will profit from it. +\index{quality improvement} Also, if some design is bad one should never hesitate to erase it and rebuild it in a sane way. +%fixme: doubled speech! Again \person{Doug McIlroy} gives valuable advice: ``Don't hesitate to throw away the clumsy parts and rebuild them.'' \cite{mcilroy78}. @@ -641,14 +790,17 @@ \subsubsection*{Good software, good feelings} +\index{good feeling} One last argument shall be added. This one is more common to Free Software but can also be found in non-free software. Free Software ``sells'' if it has a good user base. For example: Although \qmail\ is somehow outdated and its author has not released any new version since about ten years, \qmail\ still has a very strong user base and community. +\index{qmail} Good concepts, sound design, and a sane philosophy gives users good feelings for the software and faith in it. They become interested in using it and to contribute. In contrast do constant repaire work and reappearance of weaknesses leave a bad feeling. The motivation of most volunteer developers is their wish to do good work with the goal to create good software. Projects that follow admirable plans towards a good product will motivate volunteers to help. More helpers can get the 2,5 man-years for a new design in less absolute time done. Additionally is a good developers base the best start for a good user base, and users define a software's value. +\index{motivation} @@ -664,11 +816,12 @@ Strategy 3 (A new design) is slightly preferred over the combination of strategy 1 (Improve existing code) and 2 (Add wrappers and interposition filters), from the requirement's point of view. The discussion afterwards did generally support the new design strategy. But some arguments stood against it. These were: +\index{development strategy} \begin{enumerate} -\item The development time and effort -\item The time delay until new features can be added -\item The risk of failure + \item The development time and effort + \item The time delay until new features can be added + \item The risk of failure \end{enumerate} The first two arguments are only relevant for the short-time view, because both will become \emph{support arguments} for the new design, once the Break Even point is reached. @@ -677,10 +830,11 @@ With respect to the current situation, the suggested further development plan for \masqmail\ is split into a short-time plan and a long-time plan: +\index{development goal} \begin{enumerate} -\item The short-time plan: Add the most needed features, namely encryption, authentication, and security wrappers, to the current code base. -\item The long-time plan: Design a new architecture that satisfies the modern requirements, especially the quality requirements. + \item The short-time plan: Add the most needed features, namely encryption, authentication, and security wrappers, to the current code base. + \item The long-time plan: Design a new architecture that satisfies the modern requirements, especially the quality requirements. \end{enumerate} The background thought for this development plan is to first do the most needed stuff on the existing code to keep it usable. This satisfies the urgent needs and removes the time pressure from the development of the new design. After this is done, a new designed \masqmail\ should be developed from scratch. This is the work for the future. It shall, after it is usable and throughout tested, supersede the old \masqmail. @@ -688,5 +842,6 @@ The basics of this development idea can be described as: Recurrent development of a new design from scratch, while the old version is still in use and gets repaired. Hence a modern design will inherit an old one in periodic intervals. This is a very future-proof concept that combines the best of short-term and long-term planning. The price to pay is only the increased work, which gets covered by volunteers that \emph{want} to do it. +\index{motivation} diff -r b4b06bc05059 -r 16d8eacf60e1 thesis/tex/5-Improvements.tex --- a/thesis/tex/5-Improvements.tex Fri Feb 06 21:08:49 2009 +0100 +++ b/thesis/tex/5-Improvements.tex Fri Feb 06 21:09:21 2009 +0100 @@ -16,26 +16,37 @@ \subsection{Encryption} +\index{enc} Encryption (\TODO\,1) should be the first functionality to be added to the current code. The requirement was already discussed on page~\pageref{requirement-encryption}. As explained there, \NAME{STARTTLS} encryption---defined in \RFC\,2487---should be added to \masqmail. +\index{starttls} This work requires changes mainly in three source files: \path{smtp_in.c}, \path{smtp_out.c}, and \path{conf.c}. The first file includes the functionality for the \SMTP\ server. It needs to offer \NAME{STARTTLS} support to clients and needs to initiate the encryption when the client requests it. Additionally, the server should be able to insist on encryption before it accepts any message, if this is wished by the administrator. %fixme +\index{smtp} The second file includes the functionality for the \SMTP\ client. It should start the encryption by issuing the \NAME{STARTTLS} keyword if the server supports it. It should be possible to send messages only over encrypted channels, if the administrator wants so. %fixme The third file controls the configuration files. New configuration options need to be added. The encryption policy for incoming connections needs to be defined. Three choices seem necessary: no encryption, offer encryption, insist on encryption. The encryption policy for outgoing connections should be part of each route setup. The options are the same: never encrypt, encrypt if possible, insist on encryption. \subsubsection*{Depencencies} + \NAME{STARTTLS} uses \NAME{TLS} encryption which is based on certificates. Thus the \MTA\ needs its own certificate. This should be generated during installation. A third party application like \name{openssl} should be taken for this job. The encryption itself should also be done using an available library. \name{openssl} or a substitute like \name{gnutls} does then become a dependency for \masqmail. \name{gnutls} seems to be the better choice because the \name{openssl} license is incompatible to the \NAME{GPL}, under which \masqmail\ and \name{gnutls} are covered. +\index{tls} +\index{certificates} +\index{openssl} +\index{gnutls} +\index{gpl} User definable paths to \masqmail's secret key, \masqmail's certificate, and the public certificates of trusted \name{Certificate Authorities} (short: \NAME{CA}s) are also nice to have. \subsubsection*{Existing code} +\index{existing code} \person{Frederik Vermeulen} wrote an encryption patch for \qmail\ which adds \NAME{STARTTLS} support \citeweb{qmail:tls-patch}. This patch includes about 500 lines of code. +\index{qmail} Adding this code in a similar form to \masqmail\ will be fairly easy. It will save a lot of work as it is not necessary to write the code completely from scratch. @@ -45,18 +56,23 @@ \subsection{Authentication} +\index{auth} Authentication (\TODO\,2) is the second function to be added. It is important to restrict the access to \masqmail, especially for mail relay. The requirements for authentication where identified on page~\pageref{requirement-authentication}. Static access restriction, based on the \NAME{IP} address is already possible by using \NAME{TCP} \name{Wrappers}. This makes it easy to refuse all connections from outside the local network for example, which is a good prevention against being an open relay. More detailed static restrictions, like splitting between mail for users on the system and mail for relay, should \emph{not} be added to the current code. This is a concern for the new design. +\index{tcp wrappers} \subsubsection*{One of the dynamic methods} Of the three dynamic, secret based, authentication methods (\SMTP-after-\NAME{POP}, \SMTP\ authentication, and certificates) the first one drops out as it requires a \NAME{POP} server running on the same or a trusted host. \NAME{POP} servers are rare on workstations and home servers do also not regularly include them. Thus it is no option for \masqmail. +\index{auth!methods} Authentication based on certificates does suffer from the certificate infrastructure that is required. Although certificates are already used for encryption, its management overhead prevented wide spread usage for authentication. \SMTP\ authentication (also referred to as \NAME{SMTP-AUTH}) support is easiest attained by using a \name{Simple Authentication and Security Layer} (short: \NAME{SASL}) implementation. \person{Dent} sees in \NAME{SASL} the best solution for dynamic authentication of users: +\index{smtp-auth} +\index{sasl} \begin{quote} %None of these add-ons is an ideal solution. They require additional code compiled into your existing daemons that may then require special write accesss to system files. They also require additional work for busy system administrators. If you cannot use any of the nonauthenticating alternatives mentioned earlier, or your business requirements demand that all of your users' mail pass through your system no matter where they are on the Internet, SASL is probably the solution that offers the most reliable and scalable method to authenticate users. @@ -66,9 +82,14 @@ These days \NAME{SMTP-AUTH}---defined in \RFC\,2554---is supported by most email clients. If encryption is used then even insecure authentication methods like \NAME{PLAIN} and \NAME{LOGIN} become secure. + \subsubsection*{Simple Authentication and Security Layer} +\index{sasl} \masqmail\ best uses an available \NAME{SASL} library. \name{Cyrus} \NAME{SASL} is used by \postfix\ and \sendmail. It is a complete framework that makes use of existing authentication concepts like the \path{passwd} file or \NAME{PAM}. As advantage it can be included in existing user data bases. \name{gsasl} is an alternative. It comes as a library which helps with the decision for a method and with generating the appropriate dialog data; the actual transmission of the data and the authentication against some database is left open to the programmer. \name{gsasl} is used, for instance, by \name{msmtp}. It seems best to give both concepts a try and decide then which one to use. +\index{cyrus sasl} +\index{pam} +\index{gsasl} Currently, outgoing connections already feature \SMTP-\NAME{AUTH} but only in a hand-coded way. It is to decide whether this should remains as it is or should get replaced by the \NAME{SASL} approach that will be used for incoming connections. The decision should be influenced by the estimated time until the new design is usable. @@ -80,6 +101,7 @@ \subsubsection*{Authentication backend} +\index{auth!backend} For a small \MTA\ like \masqmail, it seems preferable to store the login data in a text file under \masqmail's control. This is the most simple choice for many usage scenarios. But using a central authentication facility has advantages in larger setups, too. \name{Cyrus} \NAME{SASL} supports both, so there is no problem. If \name{gsasl} is chosen, it seems best to start with an authentication file under \masqmail's control. @@ -92,12 +114,17 @@ \subsection{Security} \label{sec:current-code-security} +\index{security} Improvements to \masqmail's security (\TODO\,3) are an important requirement and are the third task to be worked on. Retrofitting security \emph{into} \masqmail\ is not or hardly possible as it was explained in section~\ref{sec:discussion-further-devel}. But adding wrappers and interposition filters can be a large step towards security. +\index{wrapper} +\index{interposition filter} \subsubsection*{Mail security layers} At first mail security layers like \name{smap} come to mind. The market share analysis in section~\ref{sec:market-share} identified such software. Mail security layers are interposition filters that are located between the untrusted network and the \MTA. They accept mail in replacement for the \MTA\ in order to separate the \MTA\ from the untrusted network. Thus they are \name{proxies}. +\index{mail security layer} +\index{smap} The work \name{smap} does is described in \cite{cabral01}: \name{smap} accepts messages as proxy for the \MTA\ and puts it into a queue. \name{smapd} a brother program runs as daemon and watches for new messages in this queue which it submits into the \MTA\ then. @@ -106,8 +133,11 @@ The advantage of mail security layers is that the \MTA\ itself needs not to bother much with untrusted environments. The proxy cares for this. \name{smap} is non-free software and thus no general choice for \masqmail. A way to achieve a similar setup is to copy \masqmail\ and strip one copy to the bare minimum of what is needed for the proxy job. \name{setuid} could be removed, and root privilege too if \name{inetd} is used. This hardens the proxy instance. +\index{inetd} +\index{proxy} Mail from outside would then come through the proxy into the system. Mail from the local host and from the local network could be directly accepted by the normal \masqmail, if those locations are considered trusted. But it seems better to have them use the proxy too, or maybe a second proxy instance with different policy. +\index{policy} The here described setup comes close to the structure of the incoming channels in the new design which is described in section~\ref{sec:new-design}. This shows the capabilities of the here chosen approach. @@ -115,17 +145,22 @@ \subsubsection*{A concrete setup} A stripped down proxy needs to be created. It should only be able to receive mail via \SMTP, encrypt the communication, authenticate clients, and send mail out via \SMTP\ to an internal socket (named ``X'' in the figure). This is a straight forward task. The normal \masqmail\ instance runs on the system too. It takes input from \name{stdin} (when the \path{sendmail} command is invoked) and via \SMTP\ where it listens on an internal socket (named ``X'' in the figure). Outgoing mail is handled without difference to a regular setup. Figure~\ref{fig:proxy-setup} depicts the setup. +\index{auth} +\index{enc} \begin{figure} \begin{center} \includegraphics[scale=0.75]{img/proxy-setup.eps} \end{center} \caption{A setup with a proxy} + \index{figure!A setup with a proxy} \label{fig:proxy-setup} \end{figure} \subsubsection*{Spam and malware handling} +\index{spam!handling} +\index{malware!handling} The presented setup is the same as the one with two \MTA\ instances and a scanner application in between, which was suggested to add spam and malware scanner afterwards to an \MTA. This is a fortunate coincidence, because a scanner like \name{amavis} can simply be put in replace for the internal socket ``X''. @@ -147,6 +182,7 @@ \section{A new design} \label{sec:new-design} +\index{masqmail!new design} In chapter~\ref{chap:present-and-future} the requirements for a modern and secure \masqmail\ were identified. Now modules that implement the various jobs of an \MTA\ are defined and plugged together to create a new \masqmail. The architecture is inspired by existing \MTA{}s and driven by the identified requirements. @@ -166,50 +202,68 @@ \item Concentrate on the mail transfer job. Use specialized external programs for other jobs. \item Keep it simple, clear, and general. \end{enumerate} +\index{compartmentalization} %fixme: << conditional compilation >> \subsubsection*{Incoming channels} +\index{incoming channels} The functional requirements for incoming channels were already discussed as \RF\,1 on page~\pageref{rf1}. Two required incoming channels were identified: the \path{sendmail} command for local mail submission and the \SMTP\ daemon for remote connections. +\index{sendmail!command} A bit different is the structure of \name{sendmail~X} at that point: Locally submitted messages go also to the \SMTP\ daemon, which is the only connection to the mail queue. %fixme: is it a smtp dialog? or a back door? -\person{Finch} proposes a similar approach \cite{finch-sendmail}: He wants the \texttt{sendmail} command to be a simple \SMTP\ client that contacts the \SMTP\ daemon of the \MTA, like it is done by connections from remote. The advantage here is to have one single module where all \SMTP\ dialog with submitters is done. Hence one single point to accept or refuse incoming mail. Additionally does the module which puts mail into the queue not need to be \name{setuid} or \name{setgid}, because it is only invoked from the \SMTP\ daemon. The \MTA's architecture would become simpler and common tasks are not duplicated in modules that do similar jobs. +\person{Finch} proposes a similar approach \cite{finch-sendmail}: He wants the \path{sendmail} command to be a simple \SMTP\ client that contacts the \SMTP\ daemon of the \MTA, like it is done by connections from remote. The advantage here is to have one single module where all \SMTP\ dialog with submitters is done. Hence one single point to accept or refuse incoming mail. Additionally does the module which puts mail into the queue not need to be \name{setuid} or \name{setgid}, because it is only invoked from the \SMTP\ daemon. The \MTA's architecture would become simpler and common tasks are not duplicated in modules that do similar jobs. +\index{sendmailx} +\index{smtp} +\index{setuid} But merging the input channels in the \SMTP\ daemon makes the \MTA\ heavily dependent on \SMTP. To \qmail\ and \postfix\ new protocol handlers may be added without change in other parts of the system. The \SMTP\ modules can even be removed if it is not needed. It is better to have a larger number of independent modules if each one is simpler then. The need to implement \SMTP\ clients in every module for internal communication makes them more complicated. +\index{qmail} +\index{postfix} With the increasing need for new protocols in mind, it seems better to have single modules for each incoming channel, although this leads to duplicated acceptance checks. Independent checks in different modules, however, have the advantage to be able to simply apply different policies. Thus it is possible to run two \SMTP\ modules that listen on different ports: one accessible from the Internet which requires authentication, the other one only accessible from the local network without authentication. The approach of simple independent modules, one for each incoming channel, should be taken. A module which is a \NAME{POP} or \NAME{IMAP} client to import contents of other mailboxes into the system may be added afterwards as it is desired. +\index{pop3} +\index{imap} \subsubsection*{Outgoing channels} +\index{outgoing channels} Outgoing mail is commonly either sent using \SMTP, piped into local commands (for example \path{uucp}), or delivered locally by appending to a mailbox. The requirements were identified on page~\pageref{rf1}. +\index{uucp} Outgoing channels are similar for \qmail, \postfix, and \name{sendmail~X}: All of them have a module to send mail using \SMTP\ and one for writing into a local mailbox. Local mail delivery is a job that should have root privilege to be able to switch to any user in order to write to his mailbox. Modular \MTA{}s do not require \name{setuid root} but the local delivery process (or its parent) should run as root. root privilege is not a mandatory requirement but any other approach has some disadvantages thus commonly root privilege is used. +\index{setuid} Local mail delivery should not be done by the \MTA, but by an \NAME{MDA} instead. This decision was discussed in section~\ref{sec:functional-requirements}. This means only an outgoing channel that pipes mail into a local command is required for local delivery. +\index{local delivery} Other outgoing channels, one for each supported protocol, should be designed like it was done in other \MTA{}s. \subsubsection*{Mail queuing} +\index{mail queue} The mail queue is the central part of an \MTA. This fact demands especially for robustness and reliability as a failure here can lead to mail loss. (See \RF\,2 on page~\pageref{rf2}.) Common \MTA{}s feature one or more mail queues, they sometimes have effectively several queues within one physical representation. \MTA\ setups that include content scanning tend to require two separate queues. To use \sendmail\ in such setups requires two independent instances with one own queue each. \exim\ can handle it with special \name{router} and \name{transport} rules but the data flow gets complicated. Hence an idea is to use two queues (\name{incoming} and \name{active} in \postfix's terminology) and have the content scanning within the move from the one to the other. +\index{exim} +\index{postfix} \sendmail, \exim, \qmail, and \masqmail\ all use at least two files to store one message in the queue: one file contains the message body, another the envelope and header information. The one containing the mail body is not modified at all. \postfix\ takes a different approach in storing queued messages in an internal format within one file. \person{Finch} suggest yet another approach: The whole queue should be stored in one single file with pointers to separating positions \cite{finch-queue}. All of the presented \MTA{}s use the file system to hold the queue; none uses a database to hold it. A database could improve the reliability of the queue through better persistence. This might be a choice for larger \MTA{}s but is none for \masqmail\ which should be kept small and simple. A running database system does likely require much more resources than \masqmail\ itself does. And as the queue's job is more storing data, than running data selection queries, a database does not gain enough to outweigh its costs. +\index{database system} Hence the choice here is having a directory with simple text files in it. This is straight forward, simple, clear, and general \dots\ and thus a good basis for reliability. It is additionally always an advantage if data is stored in the operating system's natural form, which is plain text in the Unix' case. @@ -218,23 +272,30 @@ \subsubsection*{Mail sanitizing} +\index{mail sanitizing} Mail coming into the system may be malformed, lacking headers, or can be an attempt to exploit the system. Care must be taken. In \postfix, mail is sanitized by the \name{cleanup} module, which invokes \name{rewrite}. The position in the message flow is after the message comes from one of the several incoming channels and before the message is stored into the \name{incoming} queue. \name{cleanup} does a complete check to make the mail header complete and valid. +\index{postfix} \qmail\ has the principle of ``don't parse'' which propagates the avoidance of parsing as much as possible. The reason is that parsing is a highly complex task which likely makes code exploitable. +\index{qmail} In \masqmail's new design, mail should be stored into the queue without parsing. A scanning module should then parse the message with high care. It seems best to use a \name{parser generator} for this work. The parsed data should then get modified if needed and written into a second queue. This approach has several advantages. First, the receiving parts of the system are independent from content, they simply store it into the queue. Second, one single module does the parsing and generates new messages that contain only valid data. Third, the sending parts of the system will thus only work on messages that consist of valid data. Of course, it must be ensured that each message passes through the \name{scanning} module, but this is already required for spam and malware scanning. +%fixme: ref for parser generator +\index{parser generator} The mail body will never get modified, except for removing and adding transfer protocol specific requirements like dot stuffing or special line ending characters. These translations are only done in receiving and sending modules. \person{Jon Postel}'s robustness principle (``Be liberal in what you accept, and conservative in what you send.''), which can be found in this wording in \RFC\,1122 and in different wordings in numerous \RFC{}s, should be respected in the \name{scanning} module. The module should parse the given input in a liberal way and generate clean output. \person{Raymond}'s \name{Rule of Repair} (``Repair what you can -- but when you must fail, fail noisily and as soon as possible.'') \cite[page~18]{raymond03} can be applied too. But it is important to repair only obvious problems, because repairing functionality is likely a target for attacks. +\index{robustness!principle of} \subsubsection*{Aliasing} +\index{aliases} The functional requirements were identified under \RF\,4 on page~\pageref{rf4}. From the architectural point of view, the main question about aliasing is: Where should aliases get expanded? @@ -247,6 +308,7 @@ \subsubsection*{Route management} +\index{online routes} The online state is only important for the sending modules of the system, thus it should be queried in the \name{queue-out} module which selects ready messages from the \name{outgoing} queue and transfers them to the appropriate sending module. Route-based aliasing, which was described in the last section, %fixme: is this still true? should be done in the same go. @@ -254,10 +316,12 @@ \subsubsection*{Archiving} +\index{archiving} The best point to archive copies of every incoming mail is the \name{queue-in} module, respectively the \name{queue-out} module for copies of outgoing mail. But the changes that are made by the receiving modules (adding further headers) and sending modules (address rewrites) are not respected with this approach. \qmail\ has the ability to log complete \SMTP\ dialogs. Logging the complete data transaction into and out of the system is a great feature which should be implemented into each receiving and sending module. Though, as this will produce a huge amount of output, it should be disabled by default. +\index{smtp!dialog} Archiving's functional requirements were described as \RF\,10 on page~\pageref{rf10}. @@ -266,10 +330,14 @@ \subsubsection*{Authentication and Encryption} +\index{auth} +\index{enc} The topics were discussed as \RF\,6 and \RF\,7 on several places throughout this thesis remarkable ones are on page~\pageref{rf6} and \pageref{rf7}. Authentication should be done within the receiving and sending modules. To encryption applies the same as to authentication here. Only receiving and sending modules should come in contact with it. +\index{incoming channels} +\index{outgoing channels} In order to avoid code duplicates, the actual implementation of both functions should be provided by a central source, for example a library, which is used in the various modules. @@ -279,18 +347,23 @@ \subsubsection*{Spam and malware handling} +\index{spam!handling} +\index{malware!handling} The two approaches for spam handling were already presented to the reader in section~\ref{sec:functional-requirements} as \RF\,8 and \RF\,9. Here they are described in more detail: \begin{enumerate} -\item Refusing spam during the \SMTP\ dialog: This is the way it was meant by the designers of the \SMTP\ protocol. They thought checking the sender's and recipient's mail addresses would be enough, but as they are forgeable, it is not. More and more complex checks are needed to be done. Checking needs time, but \SMTP\ dialogs time out if it takes too long. Thus during the \SMTP\ dialog, only limited time can be used for checking if a message seems to be spam. The advantage of this approach is that bad messages can simply get refused---no responsibility for them is taken and no further system load is added. See \RFC\,2505 (especially section 1.5) for detail. + \item Refusing spam during the \SMTP\ dialog: This is the way it was meant by the designers of the \SMTP\ protocol. They thought checking the sender's and recipient's mail addresses would be enough, but as they are forgeable, it is not. More and more complex checks are needed to be done. Checking needs time, but \SMTP\ dialogs time out if it takes too long. Thus during the \SMTP\ dialog, only limited time can be used for checking if a message seems to be spam. The advantage of this approach is that bad messages can simply get refused---no responsibility for them is taken and no further system load is added. See \RFC\,2505 (especially section 1.5) for detail. +\index{smtp!dialog} -\item Checking for spam after the mail was accepted and queued: Here it is possible to invest more processing time, thus more detailed checks can be done. But, as responsibility for messages was taken, it is no choice to simply delete spam mail. Checks for spam do not lead to sure results, they just indicate the possibility the message is unwanted mail. \person{Eisentraut} lists actions to take after a message is recognized as probably spam \cite[pages 18--20]{eisentraut05}. For mail the \MTA\ is responsible for, the only acceptable action is adding further or rewriting existing header lines. Thus all further work on the spam messages is the same as for non-spam messages. + \item Checking for spam after the mail was accepted and queued: Here it is possible to invest more processing time, thus more detailed checks can be done. But, as responsibility for messages was taken, it is no choice to simply delete spam mail. Checks for spam do not lead to sure results, they just indicate the possibility the message is unwanted mail. \person{Eisentraut} lists actions to take after a message is recognized as probably spam \cite[pages 18--20]{eisentraut05}. For mail the \MTA\ is responsible for, the only acceptable action is adding further or rewriting existing header lines. Thus all further work on the spam messages is the same as for non-spam messages. \end{enumerate} Modern \MTA{}s use both techniques in combination. Checks during the \SMTP\ dialog tend to be implemented in the \MTA\ to make them fast; checks after the message was queued are often done using external programs (\name{spamassassin} is a well known one). \person{Eisentraut} sees the checks during the \SMTP\ dialog to be essential: ``Ganz ohne Analyse w\"ahrend der \SMTP-Phase kommt sowieso kein \MTA\ aus, und es ist eine Frage der Einsch\"atzung, wie weit man diese Phase belasten m\"ochte.'' \cite[page 25, (translated: ``No \MTA\ can go without analysis during the \SMTP\ phase anyway, but the amount of stress one likes to put on this phase is left to his discretion.'')]{eisentraut05} Checks before a message is accepted, like \NAME{DNS} blacklists and \name{greylisting}, need to be invoked from within the receiving modules. Like for authentication and encryption, the implementation of this functionality should be provided by a central source. +\index{dns blacklist} +\index{greylisting} All checks on queued messages should be done by pushing the message through external scanners like \name{spamassassin}. The \name{scanning} module is the best place to handle this. Hence this module needs interfaces to external scanners. @@ -339,10 +412,13 @@ \includegraphics[width=\textwidth]{img/masqmail-arch-new.eps} \end{center} \caption{The new designed architecture for \masqmail} + \index{figure!The new designed architecture for \masqmail} \label{fig:masqmail-arch-new} \end{figure} This architecture is heavily influenced by the ones of \qmail\ and \postfix. Both have different incoming channels which merge in the module that puts mail into the queue; central is the queue (or more of them); and one module takes mail from the queue and passes it to one of the outgoing channels. But mail processing is built into the architecture in a more explicit way in this design than it was done in \qmail\ and \postfix. +\index{qmail} +\index{postfix} Special regard was put on addable support for further mail transfer protocols. Here the design appears to be most similar to \qmail, which was designed to handle multiple protocols. @@ -353,10 +429,12 @@ \paragraph{Receiver modules} +\index{incoming channels} They are the communication interface between external senders and the \name{queue-in} module. Each protocol needs a corresponding \name{receiver module} to be supported. Most popular is the \name{sendmail} module, which is a command to be called from the local host, and the \name{smtpd} module which usually listens on port 25. Other modules to support other protocols may be added as needed. Receiving modules that need to listen on ports should get invoked by \name{inetd}, or by \person{Bernstein}'s more secure \name{ucspi-tcp}. This makes it possible to run them with least privilege. \paragraph{The \name{queue-in} module} +\index{mail queue} Its job is to store new messages into the queue. When one of the receiving modules has a new message, it invokes the \name{queue-in} module which creates a spool file in the \name{incoming} queue and a data file in the \name{pool}. The receiver module then sends the envelope, the message header, and the message body. The \name{queue-in} modules writes the first two into the spool file, the latter one into the \name{pool}. @@ -365,20 +443,25 @@ \paragraph{The \name{queue-out} module} +\index{mail queue} This module takes messages from the \name{outgoing} queue, queries information about the online state, and passes the messages to the correct transport module. Successfully transferred messages are removed from the \name{outgoing} queue. The \masqmail\ specific tasks of the route management are handled by this module, too. \paragraph{Transport modules} +\index{outgoing channels} These modules send outgoing mail; they are the interface between \name{queue-out} and remote hosts or local commands. The most popular modules of this kind are the \name{smtp} module which acts as an \SMTP\ client and the \name{pipe} module to interface gateways to other systems or networks like \NAME{FAX} and \NAME{UUCP}. A module for local delivery is not included; \masqmail\ passes this job to an \NAME{MDA} which gets invoked through the \name{pipe} module. (See section~\ref{sec:functional-requirements} for reasons.) \subsubsection*{The queue} +\index{mail queue} The queuing system consists of two queues and a message pool. The queues store the spool files---in unprocessed form in \name{incoming} and in complete and valid form in \name{outgoing}. The \name{pool} is the storage of the data files. On disk, the three parts of the queuing system are represented by three directories within the queue path. The representation of queued messages on disk is basically the same as in current \masqmail: One file for the envelope and message header information (the ``spool file'') and a second file for the message body (the ``data file''). +\index{spool file} +\index{data file} The currently used internal structure of the spool files can remain. Following is a sample spool file from current \masqmail. The first part is the envelope and meta information. The annotations in parenthesis are only added to ease the understanding. The second part, after the empty line, is the message header. @@ -393,6 +476,7 @@ \subsubsection*{Inter-module communication} +\index{ipc} Communication between modules is required to exchange data and status information. This is also called ``Inter-process communication'' (short: \NAME{IPC}) because the modules are independent programs in this case and processes are programs in execution. @@ -405,10 +489,12 @@ \includegraphics[scale=0.75]{img/ipc-protocol.eps} \end{center} \caption{State diagram of the \NAME{IPC} protocol. (Solid lines indicate client actions, dashed lines indicate server responses.)} + \index{figure!State diagram of the \NAME{IPC} protocol.} \label{fig:ipc-protocol} \end{figure} The protocol is described in more detail now: +\index{protocol} \paragraph{Timing} One dialog consists of exactly three phases: (1) The connection attempt, (2) The envelope and header transfer, and (3) The transfer of the message body. The order is always the same. The three phases are all initiated by the client process. After each phase the server process sends a success or failure reply. Timeouts for each phase need to be implemented. @@ -426,10 +512,12 @@ \subsubsection*{Rights and permissions} +\index{permission} The set of system users that is required for \qmail\ seems to be too complex for \masqmail. One system user, like \postfix\ uses, is more appropriate. \name{root} privilege and \name{setuid} permission should to be avoided if feasible. The \name{queue-in} module is the part of the system that is most critical about permission. It either needs to run as deamon or be \name{setuid} or \name{setgid} in order to avoid a world-writable queue. \person{Ian~R.\ Justman} recommends to use \name{setgid} in this situation: +\index{setuid} \begin{quote} But if all you need to do is post a file into an area which does not have world writability but does have group writability, and you want accountability, the best, and probably easiest, way to accomplish this without the need for excess code for uid switching (which is tricky to deal with especially with setuid-to-root programs) is the setgid bit and a group-writable directory. @@ -437,11 +525,14 @@ \end{quote} \person{Bernstein} chose \name{setuid} for the \name{qmail-queue} module, \person{Venema} uses \name{setgid} in \postfix, yet the differences are small. Better than running the module as a deamon is each of them. A deamon needs more resources and therefore becomes inefficient on systems with low mail amount, like the ones \masqmail\ will probably run on. Short running processes are additionally higher obstacles for intruders, because a process will die soon if an intruder managed to take one over. +\index{qmail} +\index{postfix} The modules \name{scanning} and \name{queue-out} are candidates for all-time running daemon processes. Alternatively they could be started by \name{cron} to do single runs. Another possibility is to run a master process as daemon which starts and restarts the system parts. \postfix\ has such a master process, \qmail\ lacks it. The jobs of a master process can be done by other tools of the operating system too, thus making a master process abdicable. \masqmail\ does probably better go without a master process, because it aims to save resources, not to get the best performance. +\index{master process} A sane permission management is very important for secure software in general. The \name{principle of least privilege}, as it is often called, should be respected. If it is possible to use lower privilege then it should be done. An example for doing so is the \name{smtpd} module. It is a server module which listens on a port. One way is to start it as root and let it bind to the port and drop all privilege before it does any other work. But root privilege is avoidable completely if \name{inetd}, or one of its substitutes, listens on the port instead of the \name{smtpd} module. \name{inetd} will then launch the \name{smtpd} module to handle the connection whenever a connection attempt to the port is made. The \name{smtpd} module needs no privilege at all this way.