comparison thesis/tex/5-Improvements.tex @ 340:a13392b4fee8

some rework and fixes
author meillo@marmaro.de
date Mon, 26 Jan 2009 13:36:51 +0100
parents 3f5088841807
children a5f167ca2a01
comparison
equal deleted inserted replaced
339:f9f925c5e2d1 340:a13392b4fee8
129 The presented setup is the same as the one with two \MTA\ instances and a scanner application in between, which was suggested to add spam and malware scanner afterwards to an \MTA. This is a fortunate coincidence, because a scanner like \name{amavis} can simply be put in replace for the internal socket ``X''. 129 The presented setup is the same as the one with two \MTA\ instances and a scanner application in between, which was suggested to add spam and malware scanner afterwards to an \MTA. This is a fortunate coincidence, because a scanner like \name{amavis} can simply be put in replace for the internal socket ``X''.
130 130
131 131
132 132
133 133
134 \subsubsection*{Conditional compilation}
135 << conditional compilation >>
136
137
138
139
140 134
141 135
142 136
143 137
144 138
159 153
160 154
161 155
162 \subsection{Design decisions} 156 \subsection{Design decisions}
163 157
164 This section describes and discusses architectural decision that were made for the new design. At some points function is of matter too, but it is mostly about architecture. 158 This section describes and discusses architectural decision that were made for the new design. To functional requirements is refered to, as they were already identified in chapter \ref{chap:present-and-future}. %fixme: At some points function is of matter too, but it is mostly about architecture.
165 159
166 A number of major design ideas lead the development of the new architecture: 160 A number of major design ideas lead the development of the new architecture:
167 \begin{enumerate} 161 \begin{enumerate}
168 \item compartmentalization throughout 162 \item compartmentalization throughout
169 \item free the internal system from the in and out channels 163 \item free the internal system from the in and out channels
171 \item have a single point where all mail goes through for scanning 165 \item have a single point where all mail goes through for scanning
172 \item concentrate on the mail transfer job; use specialized external programs for other jobs 166 \item concentrate on the mail transfer job; use specialized external programs for other jobs
173 \item keep it simple, clear, and general 167 \item keep it simple, clear, and general
174 \end{enumerate} 168 \end{enumerate}
175 169
170 %fixme: << conditional compilation >>
176 171
177 172
178 \subsubsection*{Incoming channels} 173 \subsubsection*{Incoming channels}
179 174
180 \sendmail-compatible \mta{}s must support at least two incoming channels: mail submitted using the \sendmail\ command, and mail received via the \SMTP\ daemon. It is therefore common to split the incoming channel into local and remote. This is done by \qmail\ and \postfix. The same way is \person{Hafiz}'s view \cite{hafiz05}. %fixme: specify page 175 The functional requirements were already discussed as \RF\,1 on page \pageref{rf1}. At least two incoming channels were identified: the \path{sendmail} command for local mail submission and the \SMTP\ daemon for remote connections.
181 176
182 In contrast is \name{sendmail X}: Its locally submitted messages go to the \SMTP\ daemon, which is the only connection towards the mail queue. %fixme: is it a smtp dialog? or a back door? 177 A bit different is the structure of \name{sendmail X} at that point: Locally submitted messages go to the \SMTP\ daemon, which is the only connection towards the mail queue. %fixme: is it a smtp dialog? or a back door?
183 \person{Finch} proposes a similar approach. He wants the \texttt{sendmail} command to be a simple \SMTP\ client that contacts the \SMTP\ daemon of the \MTA\ like it is done by connections from remote. The advantage here is one single module where all \SMTP\ dialog with submitters is done. Hence one single point to accept or refuse incoming mail. Additionally does the module which puts mail into the queue not need to be \name{setuid} or \name{setgid} because it is only invoked from the \SMTP\ daemon. The \MTA's architecture would become simpler and common tasks are not duplicated in modules that do similar jobs. 178 \person{Finch} proposes a similar approach \cite{finch-sendmail}. He wants the \texttt{sendmail} command to be a simple \SMTP\ client that contacts the \SMTP\ daemon of the \MTA\ like it is done by connections from remote. The advantage here is one single module where all \SMTP\ dialog with submitters is done. Hence one single point to accept or refuse incoming mail. Additionally does the module which puts mail into the queue not need to be \name{setuid} or \name{setgid} because it is only invoked from the \SMTP\ daemon. The \MTA's architecture would become simpler and common tasks are not duplicated in modules that do similar jobs.
184 179
185 But merging the input channels in the \SMTP\ daemon makes the \MTA\ heavily dependent on \SMTP. To \qmail\ and \postfix\ new modules to support other ways of message reception may be added without change of other parts of the system. Also the \SMTP\ modules can be removed if it is not needed. And it is better to have more independent modules if each one is simpler then---it makes the modules more complicated if each one needs to implement an \SMTP\ client. 180 But merging the input channels in the \SMTP\ daemon makes the \MTA\ heavily dependent on \SMTP. To \qmail\ and \postfix\ new modules to support other ways of message reception may be added without change of other parts of the system. Also the \SMTP\ modules can be removed if it is not needed. And it is better to have more independent modules if each one is simpler then---it makes the modules more complicated if each one needs to implement an \SMTP\ client.
186 181
187 With the increasing need for new protocols in mind, it seems better to have single modules for each incoming channel, although this leads to duplicated acceptance checks. Independent checks in different modules, however, have also the advantage to simply apply different policies. Thus it is possible to run two \SMTP\ modules that listen on different ports; one accessible from the Internet but requires authentication, the other only accessible from the local network but does not require authentication. 182 With the increasing need for new protocols in mind, it seems better to have single modules for each incoming channel, although this leads to duplicated acceptance checks. Independent checks in different modules, however, have also the advantage to simply apply different policies. Thus it is possible to run two \SMTP\ modules that listen on different ports; one accessible from the Internet but requires authentication, the other only accessible from the local network but does not require authentication.
188 183
204 199
205 200
206 201
207 \subsubsection*{The mail queue} 202 \subsubsection*{The mail queue}
208 203
209 The mail queue is the central part of an \MTA. This demands especially for robustness and reliability as a failure here can lead to loosing mail. 204 The mail queue is the central part of an \MTA. This demands especially for robustness and reliability as a failure here can lead to loosing mail. (See \RF\,2 on page \pageref{rf2}.)
210 205
211 %\sendmail, \exim, \qmail, \name{sendmail X}, and \masqmail\ feature one single mail queue. \postfix\ has more of them.
212 Common \MTA{}s feature one or more mail queues, they sometimes have effectively several queues within one physical representation. 206 Common \MTA{}s feature one or more mail queues, they sometimes have effectively several queues within one physical representation.
213 207
214 \MTA\ setups that include content scanning tend to require two separate queues. To use \sendmail\ in such setups requires two independent instances with two separate queues. \exim\ can handle it with special \name{router} and \name{transport} rules but the data flow gets complicated. Hence an idea is to use two queues, \name{incoming} and \name{active} in \postfix's terminology, with the content scanning within the move from \name{incoming} to \name{active}. 208 \MTA\ setups that include content scanning tend to require two separate queues. To use \sendmail\ in such setups requires two independent instances with two separate queues. \exim\ can handle it with special \name{router} and \name{transport} rules but the data flow gets complicated. Hence an idea is to use two queues, \name{incoming} and \name{active} in \postfix's terminology, with the content scanning within the move from \name{incoming} to \name{active}.
215 209
216 \sendmail, \exim, \qmail, and \masqmail\ all use at least two files to store one message in the queue: one file contains the message body, another the envelope and header information. The one containing the mail body is not modified at all. \postfix\ takes a different approach in storing queued messages in an internal format within one file. \person{Finch} suggest yet another approach: storing the whole queue in one single file with pointers to separating positions \cite{finchFIXME}. %fixme: check, cite, and think about 210 \sendmail, \exim, \qmail, and \masqmail\ all use at least two files to store one message in the queue: one file contains the message body, another the envelope and header information. The one containing the mail body is not modified at all. \postfix\ takes a different approach in storing queued messages in an internal format within one file. \person{Finch} suggest yet another approach: storing the whole queue in one single file with pointers to separating positions \cite{finch-queue}.
217 211
218 All of the presented \MTA{}s use the file system to hold the queue; none uses a database to hold it. A database could improve the reliability of the queue through better persistence. This might be a choice for larger \MTA{}s but is none for \masqmail\ which should be kept small and simple. A running database system does likely require much more resources than \masqmail\ itself does. And as the queue's job is more storing data than running data selection queries, a database does not gain so much that it outweighs its costs. 212 All of the presented \MTA{}s use the file system to hold the queue; none uses a database to hold it. A database could improve the reliability of the queue through better persistence. This might be a choice for larger \MTA{}s but is none for \masqmail\ which should be kept small and simple. A running database system does likely require much more resources than \masqmail\ itself does. And as the queue's job is more storing data than running data selection queries, a database does not gain so much that it outweighs its costs.
219 213
220 Hence here the choice is having a directory with simple text files in it. This is straight forward, simple, clear, and general \dots\ and thus a good basis for reliability. It is additionally always of advantage if data is stored in the operation system's natural form, which in the case of \unix\ is plain text. 214 Hence here the choice is having a directory with simple text files in it. This is straight forward, simple, clear, and general \dots\ and thus a good basis for reliability. It is additionally always of advantage if data is stored in the operation system's natural form, which in the case of \unix\ is plain text.
221 215
240 234
241 235
242 236
243 \subsubsection*{Aliasing} 237 \subsubsection*{Aliasing}
244 238
245 The main question about aliasing is: Where should aliases get expanded? 239 The functional requirements were identified under \RF\,4 on page \pageref{rf4}. From the architectural point of view, the main question about aliasing is: Where should aliases get expanded?
246 240
247 Two facts are important to consider: Addresses expanding to lists of users lead to more envelopes. And aliases changing the recipient's domain part may make the message unsuitable for a specific online route. 241 Two facts are important to consider: Addresses expanding to lists of users lead to more envelopes. And aliases changing the recipient's domain part may make the message unsuitable for a specific online route.
248 242
249 Aliasing is often handled in expanding the alias and re-injecting the mail into the system. Unfortunately, the mail is processed twice then; additionally does the system have to handle more mail this way. If it is wanted to check the new recipient address for acceptance and do all processing again, then re-injecting it is the best choice. But already accepted messages may get rejected in the second go, because of an replacement address from within the system. This seems not to be wanted. 243 Aliasing is often handled in expanding the alias and re-injecting the mail into the system. Unfortunately, the mail is processed twice then; additionally does the system have to handle more mail this way. If it is wanted to check the new recipient address for acceptance and do all processing again, then re-injecting it is the best choice. But already accepted messages may get rejected in the second go, because of an replacement address from within the system. This seems not to be wanted.
250 244
263 257
264 The best point to archive copies of every incoming mail is the \name{queue-in} module, respectively the \name{queue-out} module for copies outgoing mail. But not respected with this approach are the changes that are made by the receiving modules (adding further headers) and sending modules (address rewrites). 258 The best point to archive copies of every incoming mail is the \name{queue-in} module, respectively the \name{queue-out} module for copies outgoing mail. But not respected with this approach are the changes that are made by the receiving modules (adding further headers) and sending modules (address rewrites).
265 259
266 \qmail\ has the ability to log complete \SMTP\ dialogs. Logging the complete data transaction into and out of the system into a separate log file is a great feature which should be implemented into each receiving and sending module. But as it will produce a huge amount of output, it should be cared to disabled it by default. 260 \qmail\ has the ability to log complete \SMTP\ dialogs. Logging the complete data transaction into and out of the system into a separate log file is a great feature which should be implemented into each receiving and sending module. But as it will produce a huge amount of output, it should be cared to disabled it by default.
267 261
262 Archiving's functional requirements were described as \RF\,10 on page \pageref{rf10}.
263
268 264
269 265
270 266
271 267
272 \subsubsection*{Authentication and Encryption} 268 \subsubsection*{Authentication and Encryption}
273 269
274 Both topics were discussed several time throughout this thesis, among other places on page \pageref{} and \pageref{}. 270 Both topics were discussed as \RF\,6 and \RF\,7 on several places throughout this thesis remarkable ones are on page \pageref{rf6} and \pageref{rf7}.
275 271
276 Authentication should be done within the receiving modules. Similar should authentication for outgoing connections be handled by the sending modules. To encryption applies the same as to authentication here. Only receiving and sending modules should come in contact with it. 272 Authentication should be done within the receiving modules. Similar should authentication for outgoing connections be handled by the sending modules. To encryption applies the same as to authentication here. Only receiving and sending modules should come in contact with it.
277 273
278 In order to avoid code duplicates, the actual implementation of both functions should be provided by a central source which gets invoked by the various modules. 274 In order to avoid code duplicates, the actual implementation of both functions should be provided by a central source which gets invoked by the various modules.
279 275
282 278
283 279
284 280
285 \subsubsection*{Spam and malware handling} 281 \subsubsection*{Spam and malware handling}
286 282
287 The two approaches for spam handling were already presented to the reader in section \ref{}. Here they are described in more detail: 283 The two approaches for spam handling were already presented to the reader in section \ref{sec:functional-requirements} as \RF\,8 and \RF\,9. Here they are described in more detail:
288 284
289 \begin{enumerate} 285 \begin{enumerate}
290 \item Refusing spam during the \SMTP\ dialog. This is the way it was meant by the designers of the \SMTP\ protocol. They thought checking the sender and recipient mail addresses would be enough, but as they are forgeable it is not. More and more complex checks need to be done. Checking needs time, but \SMTP\ dialogs time out if it takes too long. Thus only limited time can be used, during the \SMTP\ dialog, for checking if a message seems to be spam. The advantage is that acceptance of bad messages can be simply refused---no responsibility for the message is taken and no further system load is added. See \RFC2505 (especially section 1.5) for detail. 286 \item Refusing spam during the \SMTP\ dialog. This is the way it was meant by the designers of the \SMTP\ protocol. They thought checking the sender and recipient mail addresses would be enough, but as they are forgeable it is not. More and more complex checks need to be done. Checking needs time, but \SMTP\ dialogs time out if it takes too long. Thus only limited time can be used, during the \SMTP\ dialog, for checking if a message seems to be spam. The advantage is that acceptance of bad messages can be simply refused---no responsibility for the message is taken and no further system load is added. See \RFC2505 (especially section 1.5) for detail.
291 287
292 \item Checking for spam after the mail was accepted and queued. Here more processing time can be invested, so more detailed checks can be done. But, as responsibility for messages was taken by accepting them, it is no choice to simply delete spam mail. Checks for spam do not lead to sure results, they just indicate the possibility the message is unwanted mail. \person{Eisentraut} indicates actions to take after a message is recognized as probably spam \cite[pages 18--20]{eisentraut05}. The only acceptable one, for mail the \MTA\ is responsible for, is adding further or rewriting existent header lines. Thus all further work on the message is the same as for non-spam messages. 288 \item Checking for spam after the mail was accepted and queued. Here more processing time can be invested, so more detailed checks can be done. But, as responsibility for messages was taken by accepting them, it is no choice to simply delete spam mail. Checks for spam do not lead to sure results, they just indicate the possibility the message is unwanted mail. \person{Eisentraut} indicates actions to take after a message is recognized as probably spam \cite[pages 18--20]{eisentraut05}. The only acceptable one, for mail the \MTA\ is responsible for, is adding further or rewriting existent header lines. Thus all further work on the message is the same as for non-spam messages.
305 301
306 302
307 303
308 \subsubsection*{The scanning module} 304 \subsubsection*{The scanning module}
309 305
310 A lot of work was put onto the \name{scanning} module. This is not what is desired. Thus splitting it up into single parts appears to be necessary. But the decision how to split is left up to the time of prototyping. 306 A problem, which gets probably noticed by a attentive reader, is the lot of work that was put onto the \name{scanning} module. This is not what is desired. Thus splitting this module into a set of single modules appears to be necessary.
311 307
312 << fixme >> %fixme 308 The decision how to split shall not be discussed here. It is left up to the time of prototyping, because trying different approaches is good in such situations.
309
313 310
314 311
315 312
316 313
317 314
406 403
407 The connections between \name{queue-in} and \name{scanning}, as well as between \name{scanning} and \name{queue-out} is provided by the queues, only sending signals to trigger runs may be useful. Communication between receiving and transport modules and the outside world are done using the specific protocol they do handle. 404 The connections between \name{queue-in} and \name{scanning}, as well as between \name{scanning} and \name{queue-out} is provided by the queues, only sending signals to trigger runs may be useful. Communication between receiving and transport modules and the outside world are done using the specific protocol they do handle.
408 405
409 Left is only communication between the receiver modules and \name{queue-in}, and between \name{queue-out} and the transport modules. Data is exchanged using \unix\ pipes and a simple protocol. Figure \ref{fig:ipc-protocol} shows a state diagram for the protocol. Solid lines indicate client actions, dashed lines indicate server responses. 406 Left is only communication between the receiver modules and \name{queue-in}, and between \name{queue-out} and the transport modules. Data is exchanged using \unix\ pipes and a simple protocol. Figure \ref{fig:ipc-protocol} shows a state diagram for the protocol. Solid lines indicate client actions, dashed lines indicate server responses.
410 407
411 \begin{figure} 408 \begin{figure}[hbt]
412 \begin{center} 409 \begin{center}
413 \includegraphics[scale=0.75]{img/ipc-protocol.eps} 410 \includegraphics[scale=0.75]{img/ipc-protocol.eps}
414 \end{center} 411 \end{center}
415 \caption{State diagram of the \NAME{IPC} protocol} 412 \caption{State diagram of the \NAME{IPC} protocol}
416 \label{fig:ipc-protocol} 413 \label{fig:ipc-protocol}