docs/diploma

view thesis/tex/1-Introduction.tex @ 378:c9a6cbce35fd

inserted non-break spaces where appropriate
author meillo@marmaro.de
date Tue, 03 Feb 2009 18:01:33 +0100
parents ef7db2d0f3a1
children 16d8eacf60e1
line source
1 \chapter{Introduction}
2 \label{chap:introduction}
4 This chapter introduces some basic email concepts that are essential for understanding the remainder of the thesis. Then \masqmail---the program of interest---is presented. History, typical usage, and the function it provides are described. After an explanation of \masqmail's relevance, its weaknesses are pointed out. Solving these weaknesses is the topics that is covered throughout this thesis.
10 \section{Email prerequisites}
12 Electronic mail is a service on the Internet and thus, like other Internet services, defined and standardized by \name{Requests For Comments}\index{rfc} (short: \RFC{}s\index{rfc}) under management of the \name{Internet Engineering Task Force}\index{ietf} (short: \NAME{IETF}). \RFC{}s are highly technical documents and it is not required that the readers of this thesis are familiar with them.
14 This section gives an introduction into the basic internals of the email system in a low-technical language. It is intended to make the reader familiar with the essential concepts of email as they are essential throughout the thesis.
17 \subsubsection{Mail agents}
18 \index{mail agents}
20 This thesis will frequently use the three terms: \MTA, \MUA{}, and \MDA{}, naming the three different kinds of nodes of the email infrastructure. Here, they are explained with references to the ``snail mail'' system which is known from everyday life. Figure~\ref{fig:mail-agents} shows the relation between those three mail agents and the way an email message takes when passing through the system.
22 \begin{description}
23 \item[\MTA:]
24 \index{mta}
25 \name{Mail Transfer Agents} are the post offices for electronic mail. The basic job of an \MTA\ is to transport mail from senders to recipients, or more pedantic: from \MTA\ to \MTA. \sendmail, \exim, \qmail, \postfix, and, of course, \masqmail\ are \MTA{}s. \MTA{}s are explained in more detail in chapter~\ref{chap:mail-transfer-agents}.
27 \item[\MUA{}:]
28 \index{mua}
29 \name{Mail User Agents} are the software users deal with. A user writes and reads email with it. The \MUA{} passes outgoing mail to the nearest \MTA. Also the \MUA{} displays the contents of the user's mailbox. Well known \MUA{}s are \name{Mozilla Thunderbird} and \name{mutt} on \unix\ systems, and \name{Microsoft Outlook} on \name{Windows}.
31 \item[\MDA{}:]
32 \index{mda}
33 \name{Mail Delivery Agents} correspond to postmen in the real world. They receive mail, destined to recipients they are responsible for, from an \MTA, and deliver it to the mailboxes of those recipients. Many \MTA{}s include an own \MDA{}, but independent ones exist: \name{procmail} and \name{maildrop} are examples.
34 \end{description}
36 \begin{figure}
37 \begin{center}
38 \includegraphics[scale=0.75]{img/mail-agents.eps}
39 \end{center}
40 \caption{Mail agents and the way a mail message takes}
41 \label{fig:mail-agents}
42 \end{figure}
49 \subsubsection{Mail transfer with SMTP}
51 Today most of the email is transferred using the \name{Simple Mail Transfer Protocol}\index{smtp} (short: \SMTP), which is defined in \RFC\,821 and the successors \RFC\,2821 and \RFC\,5321. A good entry point for further information is \citeweb{wikipedia:smtp}.
53 A selection of important concepts of \SMTP\index{smtp!concepts of} is explained here.
55 First the \name{store and forward}\index{smtp!store and forward} transfer concept. This means mail messages are sent from \MTA\ to \MTA, until the final \MTA\ (the one which is responsible for the recipient) is reached. The message is stored for some time on each \MTA, until it is forwarded to the next \MTA.
57 This leads to the concept of \name{responsibility}\index{smtp!responsibility}. A mail message is always in the responsibility of one system. First it is the \MUA\index{mua}. When it is transferred to an \MTA, this \MTA\ takes over the responsibility for the message too. The \MUA{} can then delete its copy of the message. This is the same for each transfer---from \MTA\ to \MTA\ and finally from \MTA\ to the \MDA{}---the message gets transferred and if the transfer was successful, the responsibility for the message is transferred as well. The responsibility chain ends at a user's mailbox where he himself has control on the message.
59 A third concept is about failure handling. At any step on the way an \MTA\ may receive a message it is unable to handle. In such a case this receiving \MTA\ will \name{reject}\index{smtp!rejecting} the message before it takes responsibility for it. The sending \MTA\ still has responsibility for the message and may try other ways for sending the message. If none succeeds the \MTA\ will send a \name{bounce message}\index{smtp!bouncing} back to the original sender with information on the type of failure. Bounces are only sent if the failure is expected to be permanent or if the transfer still was unsuccessful after many tries.
63 \subsubsection{Mail messages}
65 Mail messages\index{mail message} consist of text in a specific format. This format is specified in \RFC\,822, and the successors \RFC\,2822 and \RFC\,5322.
67 A message has two parts, the \name{header}\index{mail message!header} and the \name{body}\index{mail message!body}. The header of an email message is similar to the header of a (formal) letter. It spans the first lines of the message up to the first empty line. The header consists of several lines, called \name{header lines}\index{mail message!header lines} or simply \name{headers}. They specify the sender, the recipient(s), the date, and possibly further information. Their order is irrelevant. Headers are named like the colon-separated start of those lines, for example the ``\texttt{Date:}'' header. A user may write the header himself but normally the \MUA{} does this job.
69 The body is the payload\index{mail message!payload} of the message. It is under full control of the user. From the view point of the \SMTP\ protocol, it must consist of only 7-bit \NAME{ASCII}\index{ascii} text. But arbitrary content can be included by encoding it to 7-bit \NAME{ASCII}. \NAME{MIME}\index{mime} is the common \SMTP\ extension to handle such conversion automatically in \MUA{}s.
71 Following is a sample mail message with four header lines (\texttt{From:}, \texttt{To:}, \texttt{Date:}, and \texttt{Subject:}) and three lines of message body.
73 \codeinput{input/sample-email.txt}\index{mail message!example}
75 Email messages are put into \name{envelopes}\index{mail message!envelope} for transfer. This concept is also derived from the real world so it is easy to understand. The envelope is used to route the message from sender to recipient. It contains the sender's address and addresses of one or more recipients. Envelopes are generated by \MTA{}s, usually from mail header data. The user has not to deal with them.
77 Each \MTA\ on the way reads envelopes it receives and generates new ones. If a message has recipients on different hosts, then the message gets copied and sent within multiple envelopes, one for each host.
79 The sample message would lead to two envelopes\index{mail message!more envelopes}, one from \name{markus@host01} to \name{alice@host02}, the other from \name{markus@host01} to \name{bob@host03}. Both envelopes would contain the same message.
86 \section{The \masqmail\ project}
87 \label{sec:masqmail}
89 The \masqmail\ project\index{masqmail!the project} was initiated by \person{Oliver Kurth} in 1999. His aim was to create a small \MTA\ that is especially focused on computers with dial-up Internet connections\index{dial-up}. Throughout the next four years he worked steadily on it, releasing new versions every few weeks. During the active phase of development 53 version have been released. In average, this is a new version every 20 days.
91 This thesis is based on the latest release of \masqmail---version 0.2.21, dated November 2005\index{masqmail!latest release}. It was released after a 28 month gap of inactivity. The source code of 0.2.21 is the same as of 0.2.20, with only build documents modified. The homepage of \masqmail\ \citeweb{masqmail:homepage2}\index{masqmail!homepage} does not include this latest release, but it can be retrieved from the \debian\ package pool\index{debian!package pool}\footnote{The \NAME{URL} is:\\\url{http://ftp.de.debian.org/debian/pool/main/m/masqmail/masqmail_0.2.21.orig.tar.gz}} \citeweb{packages.debian}.
93 \masqmail\ is covered by the \name{General Public License}\index{gpl} (short: \NAME{GPL}) version two or any later version \cite{fsf:gpl}. This qualifies \masqmail\ as Free Software\index{free software} \cite{fsf:freesw-definition}.
95 \person{Kurth} abandoned \masqmail\ after 2005 and no one adopted the project since then. Thus, the author of this thesis decided to take over responsibility for \masqmail\ now. He received \person{Kurth}'s permission to do so in private telephone conversation with \person{Kurth} on September 4, 2008.
97 The program's new homepage \citeweb{masqmail:homepage} includes a collection of available information about this \MTA.
102 \subsection{Target field}
103 \label{sec:masqmail-target-field}
105 \person{Kurth}'s intention when creating \masqmail\ is best told in his own words:
107 \begin{quote}
108 MasqMail is a mail server designed for hosts that do not have a permanent internet connection eg. a home network or a single host at home. It has special support for connections to different \NAME{ISP}s. It replaces sendmail or other \MTA{}s such as qmail or exim.
109 \hfill\citeweb{masqmail:homepage2}
110 \end{quote}
112 It is intended to cover a specific niche: non-permanent Internet connection and different \name{Internet Service Providers} (short: \NAME{ISP}s).
114 Although it can basically replace other \MTA{}s it is not \emph{generally} aimed to do so. The package description of \masqmail\ within \debian\ states this more clearly by changing the last sentence to:
116 \begin{quote}
117 In these cases, MasqMail is a slim replacement for full-blown \MTA{}s such as sendmail, exim, qmail or postfix.
118 \hfill\citeweb{packages.debian:masqmail}
119 \end{quote}
121 The program is a good replacement ``in these cases'' but not generally, since it lacks essential features for running on publically accessable mail servers. It is primarily not secure enough for being accessible from untrusted locations.
123 \masqmail\ is best used in home networks which are non-permanently connected to the Internet. It is easy configurable for situations which are rarely solvable with the common \MTA{}s. Such include different handling of mail to local or remote destination and respecting different routes of online connection. These features are explained in more detail in section~\ref{sec:masqmail-features}.
125 While many other \MTA{}s are general purpose \MTA{}s, \masqmail\ aims on special situations. Nevertheless, it can be used as general purpose \MTA\ too. Especially this was a design goal of \masqmail: To be a replacement for \sendmail\ or similar \MTA{}s.
127 \masqmail\ is designed to run on workstations and on servers in small networks, like they are common in \NAME{SOHO}s (\name{Small Offices/Home Offices}).
131 \subsubsection*{Typical usage scenarios}
133 This section describes three common setups that make sensible use of \masqmail. The first two are shown in figure~\ref{fig:masqmail-typical-usage}.
135 \begin{figure}
136 \begin{center}
137 \includegraphics[scale=0.75]{img/masqmail-typical-usage.eps}
138 \end{center}
139 \caption{Typical usage scenarios for \masqmail}
140 \label{fig:masqmail-typical-usage}
141 \end{figure}
143 Imagine an Internet-connected home network consisting of some workstations.
145 \begin{description}
146 \item[Scenario 1:]
147 \label{scenario1}
148 If no server is present, every workstation would be equipped with \masqmail. Mail transfer within the same machine or within the local net works straight forward using direct transfer. Outgoing mail to the Internet is sent to an \name{Internet Service Provider} (short: \NAME{ISP}) for relaying whenever the router goes online. The configuration of \masqmail\ would be the same on every computer; only host names would differ.
149 To receive mail from the Internet requires a mailbox on the \NAME{ISP}'s mail server. Mail needs to be fetched from the \NAME{ISP}'s server onto the workstation using the \NAME{POP3} or \NAME{IMAP} protocol.
151 \item[Scenario 2:]
152 \label{scenario2}
153 In the same network but with a server, one could have \masqmail\ running on the server and using simple forwarders (see section~\ref{subsec:relay-only}) on the workstations to transfer mail to the server. The server would then, dependent on the destination of the message, deliver locally or relay to an \NAME{ISP}'s server for further relay. This setup does only support mail transfer to the server but not back to a workstation. However, this can be solved by mounting the user's mailbox from the server to the workstation or by using \NAME{POP3} or \NAME{IMAP}. Mail transfer from the \NAME{ISP} to the local server needs \NAME{POP3} or \NAME{IMAP} as well.
155 \item[Scenario 3:]
156 \label{scenario3}
157 A third scenario is unrelated as it is about notebooks. Notebooks are usually used as mobile workstations. One uses them to work at different locations. With the increasing popularity of wireless networks this becomes more and more common. Different networks demand for different setups: In one network it is best to send mail to an \NAME{ISP} for relay. In another network it might be preferred to use a local mail server. A third network may have no Internet access at all, hence using a local mail server is required. All these different setups can be configured once and then used by simply telling the online state to \masqmail, even automatically within a network setup script.
158 \end{description}
161 In general, all kinds of usage scenarios within a trusted network are possible. Important to notice is that mail can not be sent from outside into the trusted network then. For using \masqmail\ on notebooks it is suggested to only accept mail from local users because notebooks are often in untrusted environments.
166 \subsubsection*{Limitations}
168 Although \masqmail\ is seen as a replacement for other general purpose \MTA{}s, it should not be used on large mail servers. The reasons are that it implements only a basic subset of features and that its performance and security is not as good as needed for such usage.
170 The author, \person{Kurth}, warns on the old project's website about using \masqmail\ to accept connections from the Internet because of the risk of being an open relay:
172 \begin{quote}
173 MasqMail is not designed to run on a host with a permanent internet connection. It does not have the ability to check for spam mail and it will relay everything from everywhere to everywhere. Use another mail server such as exim for permanent connections.
174 \hfill\citeweb{masqmail:homepage2}
175 \end{quote}
177 The actual problem is not the permanent Internet connection but listening for incoming mail on it. If a firewall is closed for incoming mail, then the permanent Internet connection is no problem. To use \masqmail\ for permanent Internet connections it needs to be secured with care.
179 The Internet is the common example for an untrusted network but other networks may be untrusted too.
192 \subsection{Features}
194 This thesis regards version 0.2.21 of \masqmail. This is the last version released by \person{Oliver Kurth}.
197 \subsubsection*{The source code}
199 \masqmail\ is written in the C programming language. The program, as of version 0.2.21, consists of 34 source code and eight header files which contain about 9\,000 lines of code\footnote{Measured with \name{sloccount} by David A.\ Wheeler \citeweb{sloccount}.}. Additionally, it includes a \name{base64} implementation (about 300 lines) and \name{md5} code (about 150 lines). For systems that do not provide \name{libident}, this library is distributed as well (circa 600 lines); an available shared library has higher precedence in linking, though.
201 The only mandatory dependency is \name{glib}---a cross-platform software utility library, originated in the \NAME{GTK+} project. It provides safe replacements for many standard library functions, especially for the string functions. It also offers handy data containers, easy-to-use implementations of data structures, and much more.
203 Some parts of \masqmail's functionality can be included or excluded at compile time by defining symbols. To enable maildir support for example, one has to add \verb_--enable-maildir_ to the configure call. Otherwise the concerning code gets removed during preprocessing.
205 With \masqmail\ comes the small tool \path{mservdetect}; it helps setting up a configuration that uses the \name{mserver} system for online state detection. Two other binaries get compiled for testing purposes: \path{readtest} and \path{smtpsend}. These three additional programs use parts of \masqmail's source code; they only add a file with a \verb+main()+ function each.
209 \subsubsection*{Features}
210 \label{sec:masqmail-features}
212 \masqmail\ supports two channels for incoming mail:
214 \begin{enumerate}
215 \item Standard input which is used when \path{masqmail} (or the \path{sendmail} link) is executed on the command line
216 \item A \NAME{TCP} socket which is used by local or remote clients that talk \SMTP
217 \end{enumerate}
219 The outgoing channels for mail are:
221 \begin{enumerate}
222 \item Direct delivery to local mailboxes (in \name{mbox} or \name{maildir} format)
223 \item Local pipes to pass mail to a program (e.g.\ to \MDA{}s or to gateways to \NAME{UUCP} or fax)
224 \item \NAME{TCP} sockets to transfer mail to other \MTA{}s using the \SMTP\ protocol
225 \end{enumerate}
227 Figure~\ref{fig:masqmail-channels} shows this as a picture. (The ``online state'' input is explained a bit later.)
229 \begin{figure}
230 \begin{center}
231 \includegraphics[scale=0.75]{img/masqmail-channels.eps}
232 \end{center}
233 \caption{Incoming and outgoing channels of \masqmail}
234 \label{fig:masqmail-channels}
235 \end{figure}
237 Outgoing \SMTP\ connections feature \SMTP-\NAME{AUTH} and \SMTP-after-\NAME{POP} authentication but incoming connections do not. Using wrappers for outgoing connections is supported. This allows encrypted communication through a gateway application like \name{openssl}.
239 Mail queuing is essential for \masqmail\ and thus supported of course, alias expansion is also supported.
241 The \masqmail\ executable can be called by various names for sendmail-compatibility reasons. As many programs expect the \MTA\ to be located at \path{/usr/lib/sendmail} or \path{/usr/sbin/sendmail}, symbolic links are pointing from there to the \masqmail\ executable. Furthermore does \sendmail\ support calling it with a different name instead of supplying command line arguments. The best known of these shortcuts is \path{mailq} which is equivalent to calling it with the argument \verb+-bq+. \masqmail\ recognizes the shortcuts \path{mailq}, \path{smtpd}, \path{mailrm}, \path{runq}, \path{rmail}, and \path{in.smtpd}. The first two are inspired by \sendmail. Not implemented yet is the shortcut \path{newaliases} because \masqmail\ does not generate binary representations of the alias file.\footnote{A shell script named \path{newaliases} that invokes \texttt{masqmail -bi} can provide the command to satisfy strict requirements.} \path{hoststat} and \path{purgestat} are missing for complete sendmail-compatibility.
242 %masqmail: mailq, mailrm, runq, rmail, smtpd/in.smtpd
243 %sendmail: hoststat, mailq, newaliases, purgestat, smtpd
245 Additional to the \MTA\ job, \masqmail\ also offers mail retrieval services by acting as a \NAME{POP3} client. It can fetch mail from different remote locations, also dependent on the active online connection. Such functionality is especially useful in a setup like \name{Scenario 2} on page \pageref{scenario2}.
249 \subsubsection*{Online detection and online routes}
250 \label{sec:masqmail-routes}
252 \masqmail\ focuses on handling different non-permanent online connections, thus a concept of online routes is used. One may configure any number of routes to send mail. Each route can have criteria to determine if some message is allowed to be sent over it. Mail to destinations outside the local network gets queued until a suitable online connections is available.
254 The idea behind this concept is sending mail to the Internet through the mail server of the same \NAME{ISP} over which one had dialed in. It was quite common that \NAME{ISP}s accepted mail for relay only if it came from a online connection they managed. This means, it was not possible to relay mail through the mail server of one \NAME{ISP} while being online through the connection of another \NAME{ISP}. \masqmail\ is a solution to the wish of switching the relaying mail server easily.
256 Related is \masqmail's ability to rewrite the sender's email address dependent on which \NAME{ISP} is used. This prevents mail from being likely classified as spam.
258 To react on the different situations, \masqmail\ needs to query the current online state. Is an online connection available? And if it is: Which one? Three methods are implemented:
260 \begin{enumerate}
261 \item Reading from a file
262 \item Reading the output of a command
263 \item Querying an \name{mserver} system
264 \end{enumerate}
266 Each method may return a string naming the route that is online or returning nothing to indicate offline state.
269 Mail for hosts inside the local network or for users on the local machine is not touched by this concept; such mail is always sent immediately.
278 \section{Why \masqmail\ is worth it}
280 First of all, \masqmail\ is better suited for its target field of operation (multiple non-permanent online connections) than every other \MTA. Especially is such usage easy to set up because \masqmail\ was designed for that. Many alternative \MTA{}s were not designed for those scenarios at all as the following two example show: ``Exim is designed for use on a network where most messages can be delivered at the first attempt.'' \cite[page~30]{hazel01}. ``qmail was designed for well-connected hosts: those with high-speed, always-on network connectivity.'' \cite[page9]{sill02}.
282 %fixme: hikernet
284 Additionally does \masqmail\ make it easy to run an \MTA\ on workstations or notebooks. There is no need to do complex configuration or to be a mail server expert. Only a handful of options need to be set; the host name, the local networks, and one route for relaying are sufficient in most times. %fixme: is that true?
286 Probably users say it best; in this case \person{Derek Broughton}:
287 \begin{quote}
288 No kidding. The whole point is that you \_have\_ to have an \MTA\ and you don't
289 want to configure Postfix/Exim/Sendmail/Qmail (almost all of which I've
290 actually done).
292 I now use masqmail -- it's really simple, my configuration is all in debconf,
293 it's supported by whereami, and it's really simple :-)
295 I'm sure you can make any \MTA\ behave nicely when offline, but it was a chore
296 with all of them.
297 \hfill\citeweb[post~\#8]{ubuntuforums:simple-mailer}
298 \end{quote}
300 Not to forget \masqmail's size. \masqmail\ is much smaller than full-blown \MTA{}s like \sendmail, \postfix, or \exim, and still smaller than \qmail. (See section~\ref{sec:mta-comparison} for details.) This makes \masqmail\ a good choice for workstations or even embedded computers.
302 Again words of a user who chose \masqmail\ as \MTA\ on his old laptop with a 75 megahertz processor and eight megabytes of \NAME{RAM}:
303 \begin{quote}
304 Masqmail appears to be a great sendmail replacement in this case. It's small and is built to support sending mail ``off-line'', and to connecting to the \SMTP\ servers of several \NAME{ISP}s.
305 \hfill\citeweb{stosberg:low-mem-laptop}
306 \end{quote}
310 Although the development on \masqmail\ has been stopped in 2003, \masqmail\ still has its users. Having users is already reason enough for further development and maintenance. This applies especially when the software covers a niche and when requirements for such software in general changed. Both is the case for \masqmail.
312 It is difficult to get numbers about users of Free Software because no one needs to tell anyone when he uses some software. \debian's \name{popcon} statistics \citeweb{popcon.debian} are a try to provided numbers. For January 2009, the statistics report 60 \masqmail\ installations of which 49 are in active use. If it is assumed that one third of all \debian\ users report their installed software\footnote{One third is a high guess as it means there would be only about 230 thousand \debian\ installations in total. But according to the \name{Linux Counter} \citeweb{counter.li.org} between 490 thousand and 12 million \debian\ users can be estimated.}, there would be in total around 150 active \masqmail\ installations in \debian. \name{Ubuntu} which also does \name{popcon} statistics \citeweb{popcon.ubuntu}, counts 82 installations with 13 active ones. If here also one third of all systems submit their data, 40 active installations can be added. Including a guessed amount of additional 30 installations on other \unix\ operating systems makes about 220 \masqmail\ installations in total. Of course one person may have \masqmail\ installed on more than one computer, but a total of 150 different users seems to be realistic.
314 %The increasing number of systems using \masqmail, as it is shown on the \name{popcon} graph \citeweb{popcon.debian:masqmail}, seems to be impressive in the beginning as \masqmail\ was not developed during that time. But it might come from the increasing popularity of \name{popcon} over the time.
316 One thing became clear now: \masqmail\ has users. And software that is used should be developed and maintained.
319 % alternative: http://anfi.homeunix.org/sendmail/dialup10.html
329 \section{Problems to solve}
331 A program that is neglected for more than five years in a field of operation that changed during this time surely needs improvement. Security and spam have highly increased in importance since 2003. Dial-up connections became rare, instead broadband flat rates are common now. Other \MTA{}s evolved in respect to theses changes---\masqmail\ did not.
333 The current market situation and trends for the future need to be identified. Looks at other \MTA{}s need to be taken. Required work on \masqmail\ needs to be defined in combination with the evaluation of strategies to do this work. And a plan for further development should be created.
339 \section{Delimitation}
341 This thesis is neither a installation guide for \masqmail\ nor a detailed explanation of \masqmail's source code. Installation and setup guides can be found on \masqmail's homepage \citeweb{masqmail:homepage}.
343 The \NAME{POP3} functionality of \masqmail\ receives few regard in this document because it is not directly related to the core of \masqmail\ which is being an \MTA.
345 The \name{mserver} system to query the online state is also only mentioned but not regarded further. It seems best to move this functionality into a separate program which is run through the shell command interface, anyway.