Mercurial > docs > unix-phil

.\".if n .pl 1000i
.de XX
.pl 1v
..
.em XX
.\".nr PI 0
.\".if t .nr PD .5v
.\".if n .nr PD 1v
.nr lu 0
.de CW
.nr PQ \\n(.f
.if t .ft CW
.ie ^\\$1^^ .if n .ul 999
.el .if n .ul 1
.if t .if !^\\$1^^ \&\\$1\f\\n(PQ\\$2
.if n .if \\n(.$=1 \&\\$1
.if n .if \\n(.$>1 \&\\$1\c
.if n .if \\n(.$>1 \&\\$2
..
.ds [. \ [
.ds .] ]
.\"----------------------------------------
.TL
Why the Unix Philosophy still matters
.AU
markus schnalke <meillo@marmaro.de>
.AB
.ti \n(.iu
This paper discusses the importance of the Unix Philosophy in software design.
Today, few software designers are aware of these concepts,
and thus most modern software is limited and does not make use of software leverage.
Knowing and following the tenets of the Unix Philosophy makes software more valuable.
.AE

.\".if t .2C

.FS
.ps -1
This paper was prepared for the seminar ``Software Analysis'' at University Ulm.
Mentor was professor Schweiggert. 2010-02-05
.br
You may get this document from my website
.CW \s-1http://marmaro.de/docs
.FE

.NH 1
Introduction
.LP
Building a software is a process from an idea of the purpose of the software
to its release.
No matter \fIhow\fP the process is run, two things are common:
the initial idea and the release.
The process in between can be of any shape.
The the maintenance work after the release is ignored for the moment.
.PP
The process of building splits mainly in two parts:
the planning of what and how to build, and implementing the plan by writing code.
This paper focuses on the planning part \(en the designing of the software.
.PP
Software design is the plan of how the internals and externals of the software should look like,
based on the requirements.
This paper discusses the recommendations of the Unix Philosophy about software design.
.PP
The here discussed ideas can get applied by any development process.
The Unix Philosophy does recommend how the software development process should look like,
but this shall not be of matter here.
Similar, the question of how to write the code is out of focus.
.PP
The name ``Unix Philosophy'' was already mentioned several times, but it was not explained yet.
The Unix Philosophy is the essence of how the Unix operating system and its toolchest was designed.
It is no limited set of rules, but what people see to be common to typical Unix software.
Several people stated their view on the Unix Philosophy.
Best known are:
.IP \(bu
Doug McIlroy's summary: ``Write programs that do one thing and do it well.''
.[
%A M. D. McIlroy
%A E. N. Pinson
%A B. A. Taque
%T UNIX Time-Sharing System Forward
%J The Bell System Technical Journal
%D 1978
%V 57
%N 6
%P 1902
.]
.IP \(bu
Mike Gancarz' book ``The UNIX Philosophy''.
.[
%A Mike Gancarz
%T The UNIX Philosophy
%D 1995
%I Digital Press
.]
.IP \(bu
Eric S. Raymond's book ``The Art of UNIX Programming''.
.[
%A Eric S. Raymond
%T The Art of UNIX Programming
%D 2003
%I Addison-Wesley
%O .CW \s-1http://www.faqs.org/docs/artu/
.]
.LP
These different views on the Unix Philosophy have much in common.
Especially, the main concepts are similar for all of them.
But there are also points on which they differ.
This only underlines what the Unix Philosophy is:
A retrospective view on the main concepts of Unix software;
especially those that were successful and unique to Unix.
.\" really?
.PP
Before we will have a look at concrete concepts,
we discuss why software design is important
and what problems bad design introduces.


.NH 1
Importance of software design in general
.LP
Why should we design software at all?
It is general knowledge, that even a bad plan is better than no plan.
Ignoring software design is programming without a plan.
This will lead pretty sure to horrible results.
.PP
The design of a software is its internal and external shape.
The design talked about here has nothing to do with visual appearance.
If we see a program as a car, then its color is of no matter.
Its design would be the car's size, its shape, the number and position of doors,
the ratio of passenger and cargo transport, and so forth.
.PP
A software's design is about quality properties.
Each of the cars may be able to drive from A to B,
but it depends on its properties whether it is a good car for passenger transport or not.
It also depends on its properties if it is a good choice for a rough mountain area.
.PP
Requirements to a software are twofold: functional and non-functional.
Functional requirements are easier to define and to verify.
They are directly the software's functions.
Functional requirements are the reason why software gets written.
Someone has a problem and needs a tool to solve it.
Being able to solve the problem is the main functional requirement.
It is the driving force behind all programming effort.
.PP
On the other hand, there are also non-functional requirements.
They are called \fIquality\fP requirements, too.
The quality of a software is about properties that are not directly related to
the software's basic functions.
Quality aspects are about the properties that are overlooked at first sight.
.PP
Quality is of few matter when the software gets initially built,
but it will be of matter in usage and maintenance of the software.
A short-sighted might see in developing a software mainly building something up.
Reality shows, that building the software the first time is only a small amount
of the overall work.
Bug fixing, extending, rebuilding of parts \(en short: maintenance work \(en
does soon take over the major part of the time spent on a software.
Not to forget the usage of the software.
These processes are highly influenced by the software's quality.
Thus, quality should never be neglected.
The problem is that you hardly ``stumble over'' bad quality during the first build,
but this is the time when you should care about good quality most.
.PP
Software design is not about the basic function of a software;
this requirement will get satisfied anyway, as it is the main driving force behind the development.
Software design is about quality aspects of the software.
Good design will lead to good quality, bad design to bad quality.
The primary functions of the software will be affected modestly by bad quality,
but good quality can provide a lot of additional gain from the software,
even at places where one never expected it.
.PP
The ISO/IEC 9126-1 standard, part 1,
.[
%I International Organization for Standardization
%T ISO Standard 9126: Software Engineering \(en Product Quality, part 1
%C Geneve
%D 2001
.]
defines the quality model as consisting out of:
.IP \(bu
.I Functionality
(suitability, accuracy, inter\%operability, security)
.IP \(bu
.I Reliability
(maturity, fault tolerance, recoverability)
.IP \(bu
.I Usability
(understandability, learnability, operability, attractiveness)
.IP \(bu
.I Efficiency
(time behavior, resource utilization)
.IP \(bu
.I Maintainability
(analyzability, changeability, stability, testability)
.IP \(bu
.I Portability
(adaptability, installability, co-existence, replaceability)
.LP
These goals are parts of a software's design.
Good design can give these properties to a software,
bad designed software will miss them.
.PP
One further goal of software design is consistency.
Consistency eases understanding, working on, and using things.
Consistent internals and consistent interfaces to the outside can be provided by good design.
.PP
We should design software because good design avoids many problems during a software's lifetime.
And we should design software because good design can offer much gain,
that can be unrelated to the software main intend.
Indeed, we should spend much effort into good design to make the software more valuable.
The Unix Philosophy shows how to design software well.
It offers guidelines to achieve good quality and high gain for the effort spent.


.NH 1
The Unix Philosophy
.LP
The origins of the Unix Philosophy were already introduced.
This chapter explains the philosophy, oriented on Gancarz,
and shows concrete examples of its application.

.NH 2
Pipes
.LP
Following are some examples to demonstrate how applied Unix Philosophy feels like.
Knowledge of using the Unix shell is assumed.
.PP
Counting the number of files in the current directory:
.DS I 2n
.CW
.ps -1
ls | wc -l
.DE
The
.CW ls
command lists all files in the current directory, one per line,
and
.CW "wc -l
counts the number of lines.
.PP
Counting the number of files that do not contain ``foo'' in their name:
.DS I 2n
.CW
.ps -1
ls | grep -v foo | wc -l
.DE
Here, the list of files is filtered by
.CW grep
to remove all that contain ``foo''.
The rest is the same as in the previous example.
.PP
Finding the five largest entries in the current directory.
.DS I 2n
.CW
.ps -1
du -s * | sort -nr | sed 5q
.DE
.CW "du -s *
returns the recursively summed sizes of all files
\(en no matter if they are regular files or directories.
.CW "sort -nr
sorts the list numerically in reverse order.
Finally,
.CW "sed 5q
quits after it has printed the fifth line.
.PP
The presented command lines are examples of what Unix people would use
to get the desired output.
There are also other ways to get the same output.
It's a user's decision which way to go.
.PP
The examples show that many tasks on a Unix system
are accomplished by combining several small programs.
The connection between the single programs is denoted by the pipe operator `|'.
.PP
Pipes, and their extensive and easy use, are one of the great
achievements of the Unix system.
Pipes between programs have been possible in earlier operating systems,
but it has never been a so central part of the concept.
When, in the early seventies, Doug McIlroy introduced pipes for the
Unix system,
``it was this concept and notation for linking several programs together
that transformed Unix from a basic file-sharing system to an entirely new way of computing.''
.[
%T Unix: An Oral History
%O .CW \s-1http://www.princeton.edu/~hos/frs122/unixhist/finalhis.htm
.]
.PP
Being able to specify pipelines in an easy way is,
however, not enough by itself.
It is only one half.
The other is the design of the programs that are used in the pipeline.
They have to interfaces that allows them to be used in such a way.

.NH 2
Interface design
.LP
Unix is, first of all, simple \(en Everything is a file.
Files are sequences of bytes, without any special structure.
Programs should be filters, which read a stream of bytes from ``standard input'' (stdin)
and write a stream of bytes to ``standard output'' (stdout).
.PP
If the files \fIare\fP sequences of bytes,
and the programs \fIare\fP filters on byte streams,
then there is exactly one standardized data interface.
Thus it is possible to combine them in any desired way.
.PP
Even a handful of small programs will yield a large set of combinations,
and thus a large set of different functions.
This is leverage!
If the programs are orthogonal to each other \(en the best case \(en
then the set of different functions is greatest.
.PP
Programs might also have a separate control interface,
besides their data interface.
The control interface is often called ``user interface'',
because it is usually designed to be used by humans.
The Unix Philosophy discourages to assume the user to be human.
Interactive use of software is slow use of software,
because the program waits for user input most of the time.
Interactive software requires the user to be in front of the computer
all the time.
Interactive software occupy the user's attention while they are running.
.PP
Now we come back to the idea of using several small programs, combined,
to have a more specific function.
If these single tools would all be interactive,
how would the user control them?
It is not only a problem to control several programs at once if they run at the same time,
it also very inefficient to have to control each of the single programs
that are intended to work as one large program.
Hence, the Unix Philosophy discourages programs to demand interactive use.
The behavior of programs should be defined at invocation.
This is done by specifying arguments (``command line switches'') to the program call.
Gancarz discusses this topic as ``avoid captive user interfaces''.
.[
%A Mike Gancarz
%T The UNIX Philosophy
%I Digital Press
%D 1995
%P 88 ff.
.]
.PP
Non-interactive use is, during development, also an advantage for testing.
Testing of interactive programs is much more complicated,
than testing of non-interactive programs.

.NH 2
The toolchest approach
.LP
A toolchest is a set of tools.
Instead of having one big tool for all tasks, one has many small tools,
each for one task.
Difficult tasks are solved by combining several of the small, simple tools.
.PP
The Unix toolchest \fIis\fP a set of small, (mostly) non-interactive programs
that are filters on byte streams.
They are, to a large extend, unrelated in their function.
Hence, the Unix toolchest provides a large set of functions
that can be accessed by combining the programs in the desired way.
.PP
There are also advantages for developing small toolchest programs.
It is easier and less error-prone to write small programs.
It is also easier and less error-prone to write a large set of small programs,
than to write one large program with all the functionality included.
If the small programs are combinable, then they offer even a larger set
of functions than the single large program.
Hence, one gets two advantages out of writing small, combinable programs.
.PP
There are two drawbacks of the toolchest approach.
First, one simple, standardized, unidirectional interface has to be sufficient.
If one feels the need for more ``logic'' than a stream of bytes,
then a different approach might be of need.
But it is also possible, that he just can not imagine a design where
a stream of bytes is sufficient.
By becoming more familiar with the ``Unix style of thinking'',
developers will more often and easier find simple designs where
a stream of bytes is a sufficient interface.
.PP
The second drawback of a toolchest affects the users.
A toolchest is often more difficult to use for novices.
It is necessary to become familiar with each of the tools,
to be able to use the right one in a given situation.
Additionally, one needs to combine the tools in a senseful way on its own.
This is like a sharp knife \(en it is a powerful tool in the hand of a master,
but of no good value in the hand of an unskilled.
.PP
However, learning single, small tool of the toolchest is easier than
learning a complex tool.
The user will have a basic understanding of a yet unknown tool,
if the several tools of the toolchest have a common style.
He will be able to transfer knowledge over one tool to another.
.PP
Moreover, the second drawback can be removed easily by adding wrappers
around the single tools.
Novice users do not need to learn several tools if a professional wraps
the single commands into a more high-level script.
Note that the wrapper script still calls the small tools;
the wrapper script is just like a skin around.
No complexity is added this way,
but new programs can get created out of existing one with very low effort.
.PP
A wrapper script for finding the five largest entries in the current directory
could look like this:
.DS I 2n
.CW
.ps -1
#!/bin/sh
du -s * | sort -nr | sed 5q
.DE
The script itself is just a text file that calls the command line
a professional user would type in directly.
Making the program flexible on the number of entries it prints,
is easily possible:
.DS I 2n
.CW
.ps -1
#!/bin/sh
num=5
[ $# -eq 1 ] && num="$1"
du -sh * | sort -nr | sed "${num}q"
.DE
This script acts like the one before, when called without an argument.
But one can also specify a numerical argument to define the number of lines to print.

.NH 2
A powerful shell
.LP
It was already said, that the Unix shell provides the possibility to
combine small programs into large ones easily.
A powerful shell is a great feature in other ways, too.
.PP
For instance by including a scripting language.
The control statements are build into the shell.
The functions, however, are the normal programs, everyone can use on the system.
Thus, the programs are known, so learning to program in the shell is easy.
Using normal programs as functions in the shell programming language
is only possible because they are small and combinable tools in a toolchest style.
.PP
The Unix shell encourages to write small scripts out of other programs,
because it is so easy to do.
This is a great step towards automation.
It is wonderful if the effort to automate a task equals the effort
it takes to do it the second time by hand.
If it is so, then the user will be happy to automate everything he does more than once.
.PP
Small programs that do one job well, standardized interfaces between them,
a mechanism to combine parts to larger parts, and an easy way to automate tasks,
this will inevitably produce software leverage.
Getting multiple times the benefit of an investment is a great offer.
.PP
The shell also encourages rapid prototyping.
Many well known programs started as quickly hacked shell scripts,
and turned into ``real'' programs, written in C, later.
Building a prototype first is a way to avoid the biggest problems
in application development.
Fred Brooks writes in ``No Silver Bullet'':
.[
%A Frederick P. Brooks, Jr.
%T No Silver Bullet: Essence and Accidents of Software Engineering
%B Information Processing 1986, the Proceedings of the IFIP Tenth World Computing Conference
%E H.-J. Kugler
%D 1986
%P 1069\(en1076
%I Elsevier Science B.V.
%C Amsterdam, The Netherlands
.]
.QP
The hardest single part of building a software system is deciding precisely what to build.
No other part of the conceptual work is so difficult as establishing the detailed
technical requirements, [...].
No other part of the work so cripples the resulting system if done wrong.
No other part is more difficult to rectify later.
.PP
Writing a prototype is a great method to become familiar with the requirements
and to actually run into real problems.
Today, prototyping is often seen as a first step in building a software.
This is, of course, good.
However, the Unix Philosophy has an \fIadditional\fP perspective on prototyping:
After having built the prototype, one might notice, that the prototype is already
\fIgood enough\fP.
Hence, no reimplementation, in a more sophisticated programming language, might be of need,
for the moment.
Maybe later, it might be necessary to rewrite the software, but not now.
.PP
By delaying further work, one keeps the flexibility to react easily on
changing requirements.
Software parts that are not written will not miss the requirements.

.NH 2
Worse is better
.LP
The Unix Philosophy aims for the 80% solution;
others call it the ``Worse is better'' approach.
.PP
First, practical experience shows, that it is almost never possible to define the
requirements completely and correctly the first time.
Hence one should not try to; it will fail anyway.
Second, practical experience shows, that requirements change during time.
Hence it is best to delay requirement-based design decisions as long as possible.
Also, the software should be small and flexible as long as possible
to react on changing requirements.
Shell scripts, for example, are more easily adjusted as C programs.
Third, practical experience shows, that maintenance is hard work.
Hence, one should keep the amount of software as small as possible;
it should just fulfill the \fIcurrent\fP requirements.
Software parts that will be written later, do not need maintenance now.
.PP
Starting with a prototype in a scripting language has several advantages:
.IP \(bu
As the initial effort is low, one will likely start right away.
.IP \(bu
As working parts are available soon, the real requirements can get identified soon.
.IP \(bu
When a software is usable, it gets used, and thus tested.
Hence problems will be found at early stages of the development.
.IP \(bu
The prototype might be enough for the moment,
thus further work on the software can be delayed to a time
when one knows better about the requirements and problems,
than now.
.IP \(bu
Implementing now only the parts that are actually needed now,
requires fewer maintenance work.
.IP \(bu
If the global situation changes so that the software is not needed anymore,
then less effort was spent into the project, than it would have be
when a different approach had been used.

.NH 2
Upgrowth and survival of software
.LP
So far it was talked about \fIwriting\fP or \fIbuilding\fP software.
Although these are just verbs, they do imply a specific view on the work process
they describe.
The better verb, however, is to \fIgrow\fP.
.PP
Creating software in the sense of the Unix Philosophy is an incremental process.
It starts with a first prototype, which evolves as requirements change.
A quickly hacked shell script might become a large, sophisticated,
compiled program this way.
Its lifetime begins with the initial prototype and ends when the software is not used anymore.
While being alive it will get extended, rearranged, rebuilt (from scratch).
Growing software matches the view that ``software is never finished. It is only released.''
.[
%O FIXME
%A Mike Gancarz
%T The UNIX Philosophy
%P 26
.]
.PP
Software can be seen as being controlled by evolutionary processes.
Successful software is software that is used by many for a long time.
This implies that the software is needed, useful, and better than alternatives.
Darwin talks about: ``The survival of the fittest.''
.[
%O FIXME
%A Charles Darwin
.]
Transferred to software: The most successful software, is the fittest,
is the one that survives.
(This may be at the level of one creature, or at the level of one species.)
The fitness of software is affected mainly by four properties:
portability of code, portability of data, range of usability, and reusability of parts.
.\" .IP \(bu
.\" portability of code
.\" .IP \(bu
.\" portability of data
.\" .IP \(bu
.\" range of usability
.\" .IP \(bu
.\" reuseability of parts
.PP
(1)
.I "Portability of code
means, using high-level programming languages,
sticking to the standard,
and avoiding optimizations that introduce dependencies on specific hardware.
Hardware has a much lower lifetime than software.
By chaining software to a specific hardware,
the software's lifetime gets shortened to that of this hardware.
In contrast, software should be easy to port \(en
adaptation is the key to success.
.\" cf. practice of prog: ch08
.PP
(2)
.I "Portability of data
is best achieved by avoiding binary representations
to store data, because binary representations differ from machine to machine.
Textual representation is favored.
Historically, ASCII was the charset of choice.
In the future, UTF-8 might be the better choice, however.
Important is that it is a plain text representation in a
very common charset encoding.
Apart from being able to transfer data between machines,
readable data has the great advantage, that humans are able
to directly edit it with text editors and other tools from the Unix toolchest.
.\" gancarz tenet 5
.PP
(3)
A large
.I "range of usability
ensures good adaptation, and thus good survival.
It is a special distinction if a software becomes used in fields of action,
the original authors did never imagine.
Software that solves problems in a general way will likely be used
for all kinds of similar problems.
Being too specific limits the range of uses.
Requirements change through time, thus use cases change or even vanish.
A good example in this point is Allman's sendmail.
Allman identifies flexibility to be one major reason for sendmail's success:
.[
%O FIXME
%A Allman
%T sendmail
.]
.QP
Second, I limited myself to the routing function [...].
This was a departure from the dominant thought of the time, [...].
.QP
Third, the sendmail configuration file was flexible enough to adopt
to a rapidly changing world [...].
.LP
Successful software adopts itself to the changing world.
.PP
(4)
.I "Reuse of parts
is even one step further.
A software may completely lose its field of action,
but parts of which the software is build may be general and independent enough
to survive this death.
If software is build by combining small independent programs,
then there are parts readily available for reuse.
Who cares if the large program is a failure,
but parts of it become successful instead?

.NH 2
Summary
.LP
This chapter explained the central ideas of the Unix Philosophy.
For each of the ideas, it was exposed what advantages they introduce.
The Unix Philosophy are guidelines that help to write valuable software.
From the view point of a software developer or software designer,
the Unix Philosophy provides answers to many software design problem.
.PP
The various ideas of the Unix Philosophy are very interweaved
and can hardly be applied independently.
However, the probably most important messages are:
.I "``Do one thing well!''" ,
.I "``Keep it simple!''" ,
and
.I "``Use software leverage!''


.NH 1
Case study: \s-1MH\s0
.LP
The previous chapter introduced and explained the Unix Philosophy
from a general point of view.
The driving force were the guidelines; references to
existing software were given only sparsely.
In this and the next chapter, concrete software will be
the driving force in the discussion.
.PP
This first case study is about the mail user agents (\s-1MUA\s0)
\s-1MH\s0 (``mail handler'') and its descendent \fInmh\fP
(``new mail handler'').
\s-1MUA\s0s provide functions to read, compose, and organize mail,
but (ideally) not to transfer.
In this document, the name \s-1MH\s0 will be used for both of them.
A distinction will only be made if differences between
them are described.


.NH 2
Historical background
.LP
Electronic mail was available in Unix very early.
The first \s-1MUA\s0 on Unix was \f(CWmail\fP,
which was already present in the First Edition.
.[
%A Peter H. Salus
%T A Quarter Century of UNIX
%D 1994
%I Addison-Wesley
%P 41 f.
.]
It was a small program that either prints the user's mailbox file
or appends text to someone elses mailbox file,
depending on the command line arguments.
.[
%O http://cm.bell-labs.com/cm/cs/who/dmr/pdfs/man12.pdf
.]
It was a program that did one job well.
This job was emailing, which was very simple then.
.PP
Later, emailing became more powerful, and thus more complex.
The simple \f(CWmail\fP, which knew nothing of subjects,
independent handling of single messages,
and long-time storage of them, was not powerful enough anymore.
At Berkeley, Kurt Shoens wrote \fIMail\fP (with capital `M')
in 1978 to provide additional functions for emailing.
Mail was still one program, but now it was large and did
several jobs.
Its user interface is modeled after the one of \fIed\fP.
It is designed for humans, but is still scriptable.
\fImailx\fP is the adaptation of Berkeley Mail into System V.
.[
%A Gunnar Ritter
%O http://heirloom.sourceforge.net/mailx_history.html
.]
Elm, pine, mutt, and a whole bunch of graphical \s-1MUA\s0s
followed Mail's direction.
They are large, monolithic programs which include all emailing functions.
.PP
A different way was taken by the people of \s-1RAND\s0 Corporation.
In the beginning, they also had used a monolitic mail system,
called \s-1MS\s0 (for ``mail system'').
But in 1977, Stockton Gaines and Norman Shapiro
came up with a proposal of a new email system concept \(en
one that honors the Unix Philosophy.
The concept was implemented by Bruce Borden in 1978 and 1979.
This was the birth of \s-1MH\s0 \(en the ``mail handler''.
.PP
Since then, \s-1RAND\s0, the University of California at Irvine and
at Berkeley, and several others have contributed to the software.
However, it's core concepts remained the same.
In the late 90s, when development of \s-1MH\s0 slowed down,
Richard Coleman started with \fInmh\fP, the new mail handler.
His goal was to improve \s-1MH\s0, especially in regard of
the requirements of modern emailing.
Today, nmh is developed by various people on the Internet.
.[
%T RAND and the Information Evolution: A History in Essays and Vignettes
%A Willis H. Ware
%D 2008
%I The RAND Corporation
%P 128\(en137
%O .CW \s-1http://www.rand.org/pubs/corporate_pubs/CP537/
.]
.[
%T MH & xmh: Email for Users & Programmers
%A Jerry Peek
%D 1991, 1992, 1995
%I O'Reilly & Associates, Inc.
%P Appendix B
%O Also available online: \f(CW\s-2http://rand-mh.sourceforge.net/book/\fP
.]

.NH 2
Contrasts to monolithic mail systems
.LP
All \s-1MUA\s0s are monolithic, except \s-1MH\s0.
Although there might acutally exist further, very little known,
toolchest \s-1MUA\s0s, this statement reflects the situation pretty well.
.PP
Monolithic \s-1MUA\s0s gather all their functions in one program.
In contrast, \s-1MH\s0 is a toolchest of many small tools \(en one for each job.
Following is a list of important programs of \s-1MH\s0's toolchest
and their function.
It gives a feeling of how the toolchest looks like.
.IP \(bu
.CW inc :
incorporate new mail (this is how mail enters the system)
.IP \(bu
.CW scan :
list messages in folder
.IP \(bu
.CW show :
show message
.IP \(bu
.CW next\fR/\fPprev :
show next/previous message
.IP \(bu
.CW folder :
change current folder
.IP \(bu
.CW refile :
refile message into folder
.IP \(bu
.CW rmm :
remove message
.IP \(bu
.CW comp :
compose a new message
.IP \(bu
.CW repl :
reply to a message
.IP \(bu
.CW forw :
forward a message
.IP \(bu
.CW send :
send a prepared message (this is how mail leaves the system)
.LP
\s-1MH\s0 has no special user interface like monolithic \s-1MUA\s0s have.
The user does not leave the shell to run \s-1MH\s0,
but he uses the various \s-1MH\s0 programs within the shell.
Using a monolithic program with a captive user interface
means ``entering'' the program, using it, and ``exiting'' the program.
Using toolchests like \s-1MH\s0 means running programs,
alone or in combinition with others, even from other toolchests,
without leaving the shell.

.NH 2
Data storage
.LP
\s-1MH\s0's mail storage is (only little more than) a directory tree
where mail folders are directories and mail messages are text files.
Working with \s-1MH\s0's toolchest is much like working
with Unix' toolchest:
\f(CWscan\fP is like \f(CWls\fP,
\f(CWshow\fP is like \f(CWcat\fP,
\f(CWfolder\fP is like \f(CWcd\fP/\f(CWpwd\fP,
\f(CWrefile\fP is like \f(CWmv\fP,
and \f(CWrmm\fP is like \f(CWrm\fP.
.PP
The context of tools in Unix is mainly the current working directory,
the user identification, and the environment variables.
\s-1MH\s0 extends this context by two more items:
.IP \(bu
The current mail folder, which is similar to the current working directory.
For mail folders, \f(CWfolder\fP provides the corresponding functionality
of \f(CWpwd\fP and \f(CWcd\fP for directories.
.IP \(bu
The current message, relative to the current mail folder,
which enables commands like \f(CWnext\fP and \f(CWprev\fP.
.LP
In contrast to Unix' context, which is chained to the shell session,
\s-1MH\s0's context is meant to be chained to a mail account.
But actually, the current message is a property of the mail folder,
which appears to be a legacy.
This will cause problems when multiple users work
in one mail folder simultaneously.
.PP
.I "Data storage.
How \s-1MH\s0 stores data was already mentioned.
Mail folders are directories (which contain a file
\&\f(CW.mh_sequences\fP) under the user's \s-1MH\s0 directory
(usually \f(CW$HOME/Mail\fP).
Mail messages are text files located in mail folders.
The files contain the messages as they were received.
The messages are numbered in ascending order in each folder.
This mailbox format is called ``\s-1MH\s0'' after the \s-1MUA\s0.
Alternatives are \fImbox\fP and \fImaildir\fP.
In the mbox format all messages are stored within one file.
This was a good solution in the early days, when messages
were only a few lines of text and were deleted soon.
Today, when single messages often include several megabytes
of attachments, it is a bad solution.
Another disadvantage of the mbox format is that it is
more difficult to write tools that work on mail messages,
because it is always necessary to first find and extract
the relevant message in the mbox file.
With the \s-1MH\s0 mailbox format,
each message is a self-standing item, by definition.
Also, the problem of concurrent access to one mailbox is
reduced to the problem of concurrent access to one message.
However, the issue of the shared parts of the context,
as mentioned above, remains.
Maildir is generally similar to \s-1MH\s0's format,
but modified towards guaranteed reliability.
This involves some complexity, unfortunately.


.NH 2
Discussion of the design
.LP
The following paragraphs discuss \s-1MH\s0 in regard to the tenets
of the Unix Philosophy which Gancarz identified.

.PP
.I "``Small is beautiful''
and
.I "``do one thing well''
are two design goals that are directly visible in \s-1MH\s0.
Gancarz actually presents \s-1MH\s0 as example under the headline
``Making UNIX Do One Thing Well'':
.QP
[\s-1MH\s0] consists of a series of programs which
when combined give the user an enormous ability
to manipulate electronic mail messages.
A complex application, it shows that not only is it
possible to build large applications from smaller
components, but also that such designs are actually preferable.
.[
%A Mike Gancarz
%T unix-phil
%P 125
.]
.LP
The various small programs of \s-1MH\s0 were relatively easy
to write, because each of them is small, limited to one function,
and has clear boundaries.
For the same reasons, they are also good to maintain.
Further more, the system can easily get extended.
One only needs to put a new program into the toolchest.
This was done, for instance, when \s-1MIME\s0 support was added
(e.g. \f(CWmhbuild\fP).
Also, different programs can exist to do the basically same job
in different ways (e.g. in nmh: \f(CWshow\fP and \f(CWmhshow\fP).
If someone needs a mail system with some additionally
functions that are available nowhere yet, he best takes a
toolchest system like \s-1MH\s0 where he can add the
functionality with little work.

.PP
.I "Store data in flat text files.
FIXME

.PP
.I "``Avoid captive user interfaces.''
\s-1MH\s0 is perfectly suited for non-interactive use.
It offers all functions directly and without captive user interfaces.
If, nonetheless, users want a graphical user interface,
they can have it with \fIxmh\fP or \fIexmh\fP, too.
These are graphical frontends for the \s-1MH\s0 toolchest.
This means, all email-related work is still done by \s-1MH\s0 tools,
but the frontend issues the appropriate calls when the user
clicks on buttons.
Providing easy-to-use user interfaces in form of frontends is a good
approach, because it does not limit the power of the backend itself.
The frontend will anyway only be able to make a subset of the
backend's power and flexibility available to the user.
But if it is a separate program,
then the missing parts can still be accessed at the backend directly.
If it is integrated, then this will hardly be possible.
Further more, it is possible to have different frontends to the same
backend.

.PP
.I "``Choose portability over efficiency''
and
.I "``use shell scripts to increase leverage and portability'' .
These two tenets are indirectly, but nicely, demonstrated by
Bolsky and Korn in their book about the Korn Shell.
.[
%T The KornShell: command and programming language
%A Morris I. Bolsky
%A David G. Korn
%I Prentice Hall
%D 1989
%P 254\(en290
%O \s-1ISBN\s0: 0-13-516972-0
.]
They demonstrated, in chapter 18 of the book, a basic implementation
of a subset of \s-1MH\s0 in ksh scripts.
Of course, this was just a demonstration, but a brilliant one.
It shows how quickly one can implement such a prototype with shell scripts,
and how readable they are.
The implementation in the scripting language may not be very fast,
but it can be fast enough though, and this is all that matters.
By having the code in an interpreted language, like the shell,
portability becomes a minor issue, if we assume the interpreter
to be widespread.
This demonstration also shows how easy it is to create single programs
of a toolchest software.
There are eight tools (two of them have multiple names) and 16 functions
with supporting code.
Each tool comprises between 12 and 38 lines of ksh,
in total about 200 lines.
The functions comprise between 3 and 78 lines of ksh,
in total about 450 lines.
Such small software is easy to write, easy to understand,
and thus easy to maintain.
A toolchest improves the possibility to only write some parts
and though create a working result.
Expanding the toolchest without global changes will likely be
possible, too.

.PP
.I "``Use software leverage to your advantage''
and the lesser tenet
.I "``allow the user to tailor the environment''
are ideally followed in the design of \s-1MH\s0.
Tailoring the environment is heavily encouraged by the ability to
directly define default options to programs.
It is even possible to define different default options
depending on the name under which the program was called.
Software leverage is heavily encouraged by the ease it is to
create shell scripts that run a specific command line,
built of several \s-1MH\s0 programs.
There is few software that so much wants users to tailor their
environment and to leverage the use of the software, like \s-1MH\s0.
Just to make one example:
One might prefer a different listing format for the \f(CWscan\fP
program.
It is possible to take one of the distributed format files
or to write one yourself.
To use the format as default for \f(CWscan\fP, a single line,
reading
.DS
.CW
scan: -form FORMATFILE
.DE
must be added to \f(CW.mh_profile\fP.
If one wants this different format as an additional command,
instead of changing the default, he needs to create a link to
\f(CWscan\fP, for instance titled \f(CWscan2\fP.
The line in \f(CW.mh_profile\fP would then start with \f(CWscan2\fP,
as the option should only be in effect when scan is called as
\f(CWscan2\fP.

.PP
.I "``Make every program a filter''
is hard to find in \s-1MH\s0.
The reason therefore is that most of \s-1MH\s0's tools provide
basic file system operations for the mailboxes.
The reason is the same because of which
\f(CWls\fP, \f(CWcp\fP, \f(CWmv\fP, and \f(CWrm\fP
aren't filters neither.
However, they build a basis on which filters can operate.
\s-1MH\s0 does not provide many filters itself, but it is a basis
to write filters for.
An example would be a mail message text highlighter,
that means a program that makes use of a color terminal to display
header lines, quotations, and signatures in distinct colors.
The author's version of this program, for instance,
is a 25 line awk script.

.PP
.I "``Build a prototype as soon as possible''
was again well followed by \s-1MH\s0.
This tenet, of course, focuses on early development, which is
long time ago for \s-1MH\s0.
But without following this guideline at the very beginning,
Bruce Borden may have not convinced the management of \s-1RAND\s0
to ever create \s-1MH\s0.
In Bruce' own words:
.QP
[...] but they [Stockton Gaines and Norm Shapiro] were not able
to convince anyone that such a system would be fast enough to be usable.
I proposed a very short project to prove the basic concepts,
and my management agreed.
Looking back, I realize that I had been very lucky with my first design.
Without nearly enough design work,
I built a working environment and some header files
with key structures and wrote the first few \s-1MH\s0 commands:
inc, show/next/prev, and comp.
[...]
With these three, I was able to convince people that the structure was viable.
This took about three weeks.
.[
%O FIXME
.]

.NH 2
Problems
.LP
\s-1MH\s0, for sure is not without problems.
There are two main problems: one is technical, the other is about human behavior.
.PP
\s-1MH\s0 is old and email today is very different to email in the time
when \s-1MH\s0 was designed.
\s-1MH\s0 adopted to the changes pretty well, but it is limited.
For example in development resources.
\s-1MIME\s0 support and support for different character encodings
is available, but only on a moderate level.
More active developers could quickly improve there.
It is also limited by design, which is the larger problem.
\s-1IMAP\s0, for example, conflicts with \s-1MH\s0's design to a large extend.
These design conflicts are not easily solvable.
Possibly, they require a redesign.
Maybe \s-1IMAP\s0 is too different to the classic mail model which \s-1MH\s0 covers,
hence \s-1MH\s0 may never work well with \s-1IMAP\s0.
.PP
The other kind of problem is human habits.
When in this world almost all \s-1MUA\s0s are monolithic,
it is very difficult to convince people to use a toolbox style \s-1MUA\s0
like \s-1MH\s0.
The habits are so strong, that even people who understood the concept
and advantages of \s-1MH\s0 do not like to switch,
simply because \s-1MH\s0 is different.
Unfortunately, the frontends to \s-1MH\s0, which could provide familiar look'n'feel,
are quite outdated and thus not very appealing compared to the modern interfaces
which monolithic \s-1MUA\s0s offer.

.NH 2
Summary \s-1MH\s0
.LP
\s-1MH\s0 is an \s-1MUA\s0 that follows the Unix Philosophy in its design
and implementation.
It consists of a toolchest of small tools, each of them does one job well.
The tools are orthogonal to each other, to a large extend.
However, for historical reasons, there also exist distinct tools
that cover the same task.
.PP
The toolchest approach offers great flexibility to the user.
He can use the complete power of the Unix shell with \s-1MH\s0.
This makes \s-1MH\s0 a very powerful mail system.
Extending and customizing \s-1MH\s0 is easy and encouraged, too.
.PP
Apart from the user's perspective, \s-1MH\s0 is development-friendly.
Its overall design follows clear rules.
The single tools do only one job, thus they are easy to understand,
easy to write, and good to maintain.
They are all independent and do not interfere with the others.
Automated testing of their function is a straight forward task.
.PP
It is sad, that \s-1MH\s0's differentness is its largest problem,
as its differentness is also its largest advantage.
Unfortunately, for most people their habits are stronger
than the attraction of the clear design and the power, \s-1MH\s0 offers.


.NH 1
Case study: uzbl

.NH 2
History
.LP
uzbl is young

.NH 2
Contrasts to similar sw
.LP
like with nmh
.LP
addons, plugins, modules

.NH 2
Gains of the design
.LP

.NH 2
Problems
.LP
broken web


.NH 1
Final thoughts

.NH 2
Quick summary
.LP
good design
.LP
unix phil
.LP
case studies

.NH 2
Why people should choose
.LP
Make the right choice!

.nr PI .5i
.rm ]<
.de ]<
.LP
.de FP
.IP \\\\$1.
\\..
.rm FS FE
..
.SH
References
.[
$LIST$
.]
.wh -1p
author	meillo@marmaro.de
date	Thu, 25 Mar 2010 09:31:06 +0100
parents	ec17b3a969c7
children	d632de027d77