Mercurial > docs > diploma
comparison thesis/tex/5-Improvements.tex @ 340:a13392b4fee8
some rework and fixes
author | meillo@marmaro.de |
---|---|
date | Mon, 26 Jan 2009 13:36:51 +0100 |
parents | 3f5088841807 |
children | a5f167ca2a01 |
comparison
equal
deleted
inserted
replaced
339:f9f925c5e2d1 | 340:a13392b4fee8 |
---|---|
129 The presented setup is the same as the one with two \MTA\ instances and a scanner application in between, which was suggested to add spam and malware scanner afterwards to an \MTA. This is a fortunate coincidence, because a scanner like \name{amavis} can simply be put in replace for the internal socket ``X''. | 129 The presented setup is the same as the one with two \MTA\ instances and a scanner application in between, which was suggested to add spam and malware scanner afterwards to an \MTA. This is a fortunate coincidence, because a scanner like \name{amavis} can simply be put in replace for the internal socket ``X''. |
130 | 130 |
131 | 131 |
132 | 132 |
133 | 133 |
134 \subsubsection*{Conditional compilation} | |
135 << conditional compilation >> | |
136 | |
137 | |
138 | |
139 | |
140 | 134 |
141 | 135 |
142 | 136 |
143 | 137 |
144 | 138 |
159 | 153 |
160 | 154 |
161 | 155 |
162 \subsection{Design decisions} | 156 \subsection{Design decisions} |
163 | 157 |
164 This section describes and discusses architectural decision that were made for the new design. At some points function is of matter too, but it is mostly about architecture. | 158 This section describes and discusses architectural decision that were made for the new design. To functional requirements is refered to, as they were already identified in chapter \ref{chap:present-and-future}. %fixme: At some points function is of matter too, but it is mostly about architecture. |
165 | 159 |
166 A number of major design ideas lead the development of the new architecture: | 160 A number of major design ideas lead the development of the new architecture: |
167 \begin{enumerate} | 161 \begin{enumerate} |
168 \item compartmentalization throughout | 162 \item compartmentalization throughout |
169 \item free the internal system from the in and out channels | 163 \item free the internal system from the in and out channels |
171 \item have a single point where all mail goes through for scanning | 165 \item have a single point where all mail goes through for scanning |
172 \item concentrate on the mail transfer job; use specialized external programs for other jobs | 166 \item concentrate on the mail transfer job; use specialized external programs for other jobs |
173 \item keep it simple, clear, and general | 167 \item keep it simple, clear, and general |
174 \end{enumerate} | 168 \end{enumerate} |
175 | 169 |
170 %fixme: << conditional compilation >> | |
176 | 171 |
177 | 172 |
178 \subsubsection*{Incoming channels} | 173 \subsubsection*{Incoming channels} |
179 | 174 |
180 \sendmail-compatible \mta{}s must support at least two incoming channels: mail submitted using the \sendmail\ command, and mail received via the \SMTP\ daemon. It is therefore common to split the incoming channel into local and remote. This is done by \qmail\ and \postfix. The same way is \person{Hafiz}'s view \cite{hafiz05}. %fixme: specify page | 175 The functional requirements were already discussed as \RF\,1 on page \pageref{rf1}. At least two incoming channels were identified: the \path{sendmail} command for local mail submission and the \SMTP\ daemon for remote connections. |
181 | 176 |
182 In contrast is \name{sendmail X}: Its locally submitted messages go to the \SMTP\ daemon, which is the only connection towards the mail queue. %fixme: is it a smtp dialog? or a back door? | 177 A bit different is the structure of \name{sendmail X} at that point: Locally submitted messages go to the \SMTP\ daemon, which is the only connection towards the mail queue. %fixme: is it a smtp dialog? or a back door? |
183 \person{Finch} proposes a similar approach. He wants the \texttt{sendmail} command to be a simple \SMTP\ client that contacts the \SMTP\ daemon of the \MTA\ like it is done by connections from remote. The advantage here is one single module where all \SMTP\ dialog with submitters is done. Hence one single point to accept or refuse incoming mail. Additionally does the module which puts mail into the queue not need to be \name{setuid} or \name{setgid} because it is only invoked from the \SMTP\ daemon. The \MTA's architecture would become simpler and common tasks are not duplicated in modules that do similar jobs. | 178 \person{Finch} proposes a similar approach \cite{finch-sendmail}. He wants the \texttt{sendmail} command to be a simple \SMTP\ client that contacts the \SMTP\ daemon of the \MTA\ like it is done by connections from remote. The advantage here is one single module where all \SMTP\ dialog with submitters is done. Hence one single point to accept or refuse incoming mail. Additionally does the module which puts mail into the queue not need to be \name{setuid} or \name{setgid} because it is only invoked from the \SMTP\ daemon. The \MTA's architecture would become simpler and common tasks are not duplicated in modules that do similar jobs. |
184 | 179 |
185 But merging the input channels in the \SMTP\ daemon makes the \MTA\ heavily dependent on \SMTP. To \qmail\ and \postfix\ new modules to support other ways of message reception may be added without change of other parts of the system. Also the \SMTP\ modules can be removed if it is not needed. And it is better to have more independent modules if each one is simpler then---it makes the modules more complicated if each one needs to implement an \SMTP\ client. | 180 But merging the input channels in the \SMTP\ daemon makes the \MTA\ heavily dependent on \SMTP. To \qmail\ and \postfix\ new modules to support other ways of message reception may be added without change of other parts of the system. Also the \SMTP\ modules can be removed if it is not needed. And it is better to have more independent modules if each one is simpler then---it makes the modules more complicated if each one needs to implement an \SMTP\ client. |
186 | 181 |
187 With the increasing need for new protocols in mind, it seems better to have single modules for each incoming channel, although this leads to duplicated acceptance checks. Independent checks in different modules, however, have also the advantage to simply apply different policies. Thus it is possible to run two \SMTP\ modules that listen on different ports; one accessible from the Internet but requires authentication, the other only accessible from the local network but does not require authentication. | 182 With the increasing need for new protocols in mind, it seems better to have single modules for each incoming channel, although this leads to duplicated acceptance checks. Independent checks in different modules, however, have also the advantage to simply apply different policies. Thus it is possible to run two \SMTP\ modules that listen on different ports; one accessible from the Internet but requires authentication, the other only accessible from the local network but does not require authentication. |
188 | 183 |
204 | 199 |
205 | 200 |
206 | 201 |
207 \subsubsection*{The mail queue} | 202 \subsubsection*{The mail queue} |
208 | 203 |
209 The mail queue is the central part of an \MTA. This demands especially for robustness and reliability as a failure here can lead to loosing mail. | 204 The mail queue is the central part of an \MTA. This demands especially for robustness and reliability as a failure here can lead to loosing mail. (See \RF\,2 on page \pageref{rf2}.) |
210 | 205 |
211 %\sendmail, \exim, \qmail, \name{sendmail X}, and \masqmail\ feature one single mail queue. \postfix\ has more of them. | |
212 Common \MTA{}s feature one or more mail queues, they sometimes have effectively several queues within one physical representation. | 206 Common \MTA{}s feature one or more mail queues, they sometimes have effectively several queues within one physical representation. |
213 | 207 |
214 \MTA\ setups that include content scanning tend to require two separate queues. To use \sendmail\ in such setups requires two independent instances with two separate queues. \exim\ can handle it with special \name{router} and \name{transport} rules but the data flow gets complicated. Hence an idea is to use two queues, \name{incoming} and \name{active} in \postfix's terminology, with the content scanning within the move from \name{incoming} to \name{active}. | 208 \MTA\ setups that include content scanning tend to require two separate queues. To use \sendmail\ in such setups requires two independent instances with two separate queues. \exim\ can handle it with special \name{router} and \name{transport} rules but the data flow gets complicated. Hence an idea is to use two queues, \name{incoming} and \name{active} in \postfix's terminology, with the content scanning within the move from \name{incoming} to \name{active}. |
215 | 209 |
216 \sendmail, \exim, \qmail, and \masqmail\ all use at least two files to store one message in the queue: one file contains the message body, another the envelope and header information. The one containing the mail body is not modified at all. \postfix\ takes a different approach in storing queued messages in an internal format within one file. \person{Finch} suggest yet another approach: storing the whole queue in one single file with pointers to separating positions \cite{finchFIXME}. %fixme: check, cite, and think about | 210 \sendmail, \exim, \qmail, and \masqmail\ all use at least two files to store one message in the queue: one file contains the message body, another the envelope and header information. The one containing the mail body is not modified at all. \postfix\ takes a different approach in storing queued messages in an internal format within one file. \person{Finch} suggest yet another approach: storing the whole queue in one single file with pointers to separating positions \cite{finch-queue}. |
217 | 211 |
218 All of the presented \MTA{}s use the file system to hold the queue; none uses a database to hold it. A database could improve the reliability of the queue through better persistence. This might be a choice for larger \MTA{}s but is none for \masqmail\ which should be kept small and simple. A running database system does likely require much more resources than \masqmail\ itself does. And as the queue's job is more storing data than running data selection queries, a database does not gain so much that it outweighs its costs. | 212 All of the presented \MTA{}s use the file system to hold the queue; none uses a database to hold it. A database could improve the reliability of the queue through better persistence. This might be a choice for larger \MTA{}s but is none for \masqmail\ which should be kept small and simple. A running database system does likely require much more resources than \masqmail\ itself does. And as the queue's job is more storing data than running data selection queries, a database does not gain so much that it outweighs its costs. |
219 | 213 |
220 Hence here the choice is having a directory with simple text files in it. This is straight forward, simple, clear, and general \dots\ and thus a good basis for reliability. It is additionally always of advantage if data is stored in the operation system's natural form, which in the case of \unix\ is plain text. | 214 Hence here the choice is having a directory with simple text files in it. This is straight forward, simple, clear, and general \dots\ and thus a good basis for reliability. It is additionally always of advantage if data is stored in the operation system's natural form, which in the case of \unix\ is plain text. |
221 | 215 |
240 | 234 |
241 | 235 |
242 | 236 |
243 \subsubsection*{Aliasing} | 237 \subsubsection*{Aliasing} |
244 | 238 |
245 The main question about aliasing is: Where should aliases get expanded? | 239 The functional requirements were identified under \RF\,4 on page \pageref{rf4}. From the architectural point of view, the main question about aliasing is: Where should aliases get expanded? |
246 | 240 |
247 Two facts are important to consider: Addresses expanding to lists of users lead to more envelopes. And aliases changing the recipient's domain part may make the message unsuitable for a specific online route. | 241 Two facts are important to consider: Addresses expanding to lists of users lead to more envelopes. And aliases changing the recipient's domain part may make the message unsuitable for a specific online route. |
248 | 242 |
249 Aliasing is often handled in expanding the alias and re-injecting the mail into the system. Unfortunately, the mail is processed twice then; additionally does the system have to handle more mail this way. If it is wanted to check the new recipient address for acceptance and do all processing again, then re-injecting it is the best choice. But already accepted messages may get rejected in the second go, because of an replacement address from within the system. This seems not to be wanted. | 243 Aliasing is often handled in expanding the alias and re-injecting the mail into the system. Unfortunately, the mail is processed twice then; additionally does the system have to handle more mail this way. If it is wanted to check the new recipient address for acceptance and do all processing again, then re-injecting it is the best choice. But already accepted messages may get rejected in the second go, because of an replacement address from within the system. This seems not to be wanted. |
250 | 244 |
263 | 257 |
264 The best point to archive copies of every incoming mail is the \name{queue-in} module, respectively the \name{queue-out} module for copies outgoing mail. But not respected with this approach are the changes that are made by the receiving modules (adding further headers) and sending modules (address rewrites). | 258 The best point to archive copies of every incoming mail is the \name{queue-in} module, respectively the \name{queue-out} module for copies outgoing mail. But not respected with this approach are the changes that are made by the receiving modules (adding further headers) and sending modules (address rewrites). |
265 | 259 |
266 \qmail\ has the ability to log complete \SMTP\ dialogs. Logging the complete data transaction into and out of the system into a separate log file is a great feature which should be implemented into each receiving and sending module. But as it will produce a huge amount of output, it should be cared to disabled it by default. | 260 \qmail\ has the ability to log complete \SMTP\ dialogs. Logging the complete data transaction into and out of the system into a separate log file is a great feature which should be implemented into each receiving and sending module. But as it will produce a huge amount of output, it should be cared to disabled it by default. |
267 | 261 |
262 Archiving's functional requirements were described as \RF\,10 on page \pageref{rf10}. | |
263 | |
268 | 264 |
269 | 265 |
270 | 266 |
271 | 267 |
272 \subsubsection*{Authentication and Encryption} | 268 \subsubsection*{Authentication and Encryption} |
273 | 269 |
274 Both topics were discussed several time throughout this thesis, among other places on page \pageref{} and \pageref{}. | 270 Both topics were discussed as \RF\,6 and \RF\,7 on several places throughout this thesis remarkable ones are on page \pageref{rf6} and \pageref{rf7}. |
275 | 271 |
276 Authentication should be done within the receiving modules. Similar should authentication for outgoing connections be handled by the sending modules. To encryption applies the same as to authentication here. Only receiving and sending modules should come in contact with it. | 272 Authentication should be done within the receiving modules. Similar should authentication for outgoing connections be handled by the sending modules. To encryption applies the same as to authentication here. Only receiving and sending modules should come in contact with it. |
277 | 273 |
278 In order to avoid code duplicates, the actual implementation of both functions should be provided by a central source which gets invoked by the various modules. | 274 In order to avoid code duplicates, the actual implementation of both functions should be provided by a central source which gets invoked by the various modules. |
279 | 275 |
282 | 278 |
283 | 279 |
284 | 280 |
285 \subsubsection*{Spam and malware handling} | 281 \subsubsection*{Spam and malware handling} |
286 | 282 |
287 The two approaches for spam handling were already presented to the reader in section \ref{}. Here they are described in more detail: | 283 The two approaches for spam handling were already presented to the reader in section \ref{sec:functional-requirements} as \RF\,8 and \RF\,9. Here they are described in more detail: |
288 | 284 |
289 \begin{enumerate} | 285 \begin{enumerate} |
290 \item Refusing spam during the \SMTP\ dialog. This is the way it was meant by the designers of the \SMTP\ protocol. They thought checking the sender and recipient mail addresses would be enough, but as they are forgeable it is not. More and more complex checks need to be done. Checking needs time, but \SMTP\ dialogs time out if it takes too long. Thus only limited time can be used, during the \SMTP\ dialog, for checking if a message seems to be spam. The advantage is that acceptance of bad messages can be simply refused---no responsibility for the message is taken and no further system load is added. See \RFC2505 (especially section 1.5) for detail. | 286 \item Refusing spam during the \SMTP\ dialog. This is the way it was meant by the designers of the \SMTP\ protocol. They thought checking the sender and recipient mail addresses would be enough, but as they are forgeable it is not. More and more complex checks need to be done. Checking needs time, but \SMTP\ dialogs time out if it takes too long. Thus only limited time can be used, during the \SMTP\ dialog, for checking if a message seems to be spam. The advantage is that acceptance of bad messages can be simply refused---no responsibility for the message is taken and no further system load is added. See \RFC2505 (especially section 1.5) for detail. |
291 | 287 |
292 \item Checking for spam after the mail was accepted and queued. Here more processing time can be invested, so more detailed checks can be done. But, as responsibility for messages was taken by accepting them, it is no choice to simply delete spam mail. Checks for spam do not lead to sure results, they just indicate the possibility the message is unwanted mail. \person{Eisentraut} indicates actions to take after a message is recognized as probably spam \cite[pages 18--20]{eisentraut05}. The only acceptable one, for mail the \MTA\ is responsible for, is adding further or rewriting existent header lines. Thus all further work on the message is the same as for non-spam messages. | 288 \item Checking for spam after the mail was accepted and queued. Here more processing time can be invested, so more detailed checks can be done. But, as responsibility for messages was taken by accepting them, it is no choice to simply delete spam mail. Checks for spam do not lead to sure results, they just indicate the possibility the message is unwanted mail. \person{Eisentraut} indicates actions to take after a message is recognized as probably spam \cite[pages 18--20]{eisentraut05}. The only acceptable one, for mail the \MTA\ is responsible for, is adding further or rewriting existent header lines. Thus all further work on the message is the same as for non-spam messages. |
305 | 301 |
306 | 302 |
307 | 303 |
308 \subsubsection*{The scanning module} | 304 \subsubsection*{The scanning module} |
309 | 305 |
310 A lot of work was put onto the \name{scanning} module. This is not what is desired. Thus splitting it up into single parts appears to be necessary. But the decision how to split is left up to the time of prototyping. | 306 A problem, which gets probably noticed by a attentive reader, is the lot of work that was put onto the \name{scanning} module. This is not what is desired. Thus splitting this module into a set of single modules appears to be necessary. |
311 | 307 |
312 << fixme >> %fixme | 308 The decision how to split shall not be discussed here. It is left up to the time of prototyping, because trying different approaches is good in such situations. |
309 | |
313 | 310 |
314 | 311 |
315 | 312 |
316 | 313 |
317 | 314 |
406 | 403 |
407 The connections between \name{queue-in} and \name{scanning}, as well as between \name{scanning} and \name{queue-out} is provided by the queues, only sending signals to trigger runs may be useful. Communication between receiving and transport modules and the outside world are done using the specific protocol they do handle. | 404 The connections between \name{queue-in} and \name{scanning}, as well as between \name{scanning} and \name{queue-out} is provided by the queues, only sending signals to trigger runs may be useful. Communication between receiving and transport modules and the outside world are done using the specific protocol they do handle. |
408 | 405 |
409 Left is only communication between the receiver modules and \name{queue-in}, and between \name{queue-out} and the transport modules. Data is exchanged using \unix\ pipes and a simple protocol. Figure \ref{fig:ipc-protocol} shows a state diagram for the protocol. Solid lines indicate client actions, dashed lines indicate server responses. | 406 Left is only communication between the receiver modules and \name{queue-in}, and between \name{queue-out} and the transport modules. Data is exchanged using \unix\ pipes and a simple protocol. Figure \ref{fig:ipc-protocol} shows a state diagram for the protocol. Solid lines indicate client actions, dashed lines indicate server responses. |
410 | 407 |
411 \begin{figure} | 408 \begin{figure}[hbt] |
412 \begin{center} | 409 \begin{center} |
413 \includegraphics[scale=0.75]{img/ipc-protocol.eps} | 410 \includegraphics[scale=0.75]{img/ipc-protocol.eps} |
414 \end{center} | 411 \end{center} |
415 \caption{State diagram of the \NAME{IPC} protocol} | 412 \caption{State diagram of the \NAME{IPC} protocol} |
416 \label{fig:ipc-protocol} | 413 \label{fig:ipc-protocol} |