changeset 122:c234656329e0

Wrote about modularization.
author markus schnalke <meillo@marmaro.de>
date Fri, 29 Jun 2012 22:51:25 +0200
parents edbc6e1dc636
children 740f4128dea7
files discussion.roff
diffstat 1 files changed, 243 insertions(+), 7 deletions(-) [+]
line wrap: on
line diff
--- a/discussion.roff	Tue Jun 26 22:06:20 2012 +0200
+++ b/discussion.roff	Fri Jun 29 22:51:25 2012 +0200
@@ -2538,7 +2538,7 @@
 kernighan pike practice of programming
 .], p. 23]
 demands: ``Don't belabor the obvious.''
-Hence, I simply removed comments like the following:
+Hence, I simply removed all the comments in the following code excerpt:
 .VS
 context_replace(curfolder, folder);  /* update current folder  */
 seq_setcur(mp, mp->lowsel);  /* update current message */
@@ -2954,13 +2954,249 @@
 
 .H2 "Modularization
 .P
-The \fIMH library\fP
-.Fn libmh.a
-collects a bunch of standard functions that many of the MH tools need,
-like reading the profile or context files.
-This doesn't hurt the separation.
+Mmh's code base is split into two directories,
+.Fn sbr
+(``subroutines'')
+and
+.Fn uip
+(``user interface programs'').
+The directory
+.Fn sbr
+contains the sources of the \fIMH library\fP
+.Fn libmh.a .
+It includes functions that mmh tools usually need.
+Among them are MH-specific functions for profile, context, sequence,
+and folder handling, but as well
+MH-independent functions, such as advanced string processing functions,
+portability interfaces and error-checking wrappers for critical
+functions of the standard library.
+.P
+The MH library is a standard library for the source files in the
+.Fn uip
+directory.
+There reside the sources of the programs of the mmh toolchest.
+Each tools has a source file with the name name.
+For example,
+.Pn rmm
+is built from
+.Fn uip/rmm.c .
+Some source files are used by multiple programs.
+For example
+.Fn uip/scansbr.c
+is used by both,
+.Pn scan
+and
+.Pn inc .
+In nmh, 49 tools were built from 76 source files.
+That is a ratio of 1.6 source files per program.
+17 programs depended on the equally named source file only.
+32 programs depended on multiple source files.
+In mmh, 39 tools are built from 51 source files.
+That is a ratio of 1.3 source files per program.
+21 programs depended on the equally named source file only.
+18 programs depended on multiple source files.
+The MH library as well as shell scripts and multiple names to the
+same program were ignored.
+.P
+Splitting the source code of one program into multiple files can
+increase the readability of its source code.
+This applies primary to complex programs.
+Most of the mmh tools, however, are simple and staight-forward programs.
+With the exception of the MIME handling tools,
+.Pn pick
+is the largest tools.
+It contains 1\|037 lines of source code (measured with
+.Pn sloccount ), excluding the MH library.
+Only the MIME handling tools (\c
+.Pn mhbuild ,
+.Pn mhstore ,
+.Pn show ,
+etc.)
+are larger.
+Splitting programs with less than 1\|000 lines of code into multiple
+source files leads seldom to better readability.
+The such tools, splitting makes sense,
+when parts of the code are reused in other programs,
+and the reused code fragment is not general enough
+for including it in the MH library,
+or, if has depencencies on a library that only few programs need.
+.Fn uip/packsbr.c ,
+for instance, provides the core program logic for the
+.Pn packf
+and
+.Pn rcvpack
+programs.
+.Fn uip/packf.c
+and
+.Fn uip/rcvpack.c
+mainly wrap the core function appropriately.
+No other tools use the folder packing functions.
+.P
+The task of MIME handling is complex enough that splitting its code
+into multiple source files improves the readability.
+The program
+.Pn mhstore ,
+for instance, is compiled out of seven source files with 2\|500
+lines of code in summary.
+The main code file
+.Fn uip/mhstore.c
+consists of 800 lines; the rest is reused in the other MIME handling tools.
+It might be worthwhile to bundle the generic MIME handling code into
+a MH-MIME library, in resemblence of the MH standard library.
+This is left open for the future.
 .P
-whatnowproc
+The work already done focussed on the non-MIME tools.
+The amount of code compiled into each program was reduced.
+This eased the understanding of the code base.
+In nmh,
+.Pn comp
+was built from six source files:
+.Fn comp.c ,
+.Fn whatnowproc.c ,
+.Fn whatnowsbr.c ,
+.Fn sendsbr.c ,
+.Fn annosbr.c ,
+and
+.Fn distsbr.c .
+In mmh, it builds from only two:
+.Fn comp.c
+and
+.Fn whatnowproc.c .
+Instead of invoking the
+.Pn whatnow ,
+.Pn send ,
+and
+.Pn anno
+programs
+their core function was compiled into nmh's
+.Pn comp .
+This saved the need to
+.Fu fork()
+and
+.Fu exec() ,
+two expensive system calls.
+Whereis this approach improved the time performance,
+it interweaved the source code.
+Core functionalities were not encapsulated into programs but into
+function, which were then wrapped by programs.
+For example,
+.Fn uip/annosbr.c
+included the function
+.Fu annotate() .
+Each program that wanted to annotate messages, included the source file
+.Fn uip/annosbr.c .
+The programs called
+.Fu annotate() ,
+which required seven parameters, reflecting the command line switches of
+.Pn anno .
+When another pair of command line switches was added to
+.Pn anno ,
+a rather ugly hack was implemented to avoid adding another parameter
+to the function.
+.Ci d9b1d57351d104d7ec1a5621f090657dcce8cb7f
+.P
+Separation simplifies the understanding of program code
+because the area influenced by any particular statement is smaller.
+The separating on the program-level is more strict than the separation
+on the function level.
+In mmh, the relevant code of
+.Pn comp
+comprises the two files
+.Fn uip/comp.c
+and
+.Fn uip/whatnowproc.c ,
+together 210 lines of code,
+the standard libraries excluded.
+In nmh,
+.Pn comp
+comprises six files with 2\|450 lines.
+Of course, not all of the code in these six files was actually used by
+.Pn comp ,
+but the code reader needs to understand the code first to know which.
+.P
+As I have read a lot in the code base during the last two years to
+understand it, I learned about the easy and the difficult parts.
+The smaller the influenced code area is, the stricter the boundaries
+are defined, and the more straight-forward the code is written,
+the easier is it to be understood.
+Reading the
+.Pn rmm 's
+source code in
+.Fn uip/rmm.c
+is my recommendation for a beginner's entry point into the code base of nmh.
+The reasons are that the task of
+.Pn rmm
+is straight forward and it consists of one small source code file only,
+yet its source includes code constructs typical for MH tools.
+With the introduction of the trash folder in mmh,
+.Pn rmm
+became a bit more complex, because it invokes
+.Pn refile .
+Still, it is a good example for a simple tool with clear sources.
+.P
+Understanding
+.Pn comp
+requires to read 210 lines of code in mmh, but ten times as much in nmh.
+In the aforementioned hack in
+.Pn anno
+to save the additional parameter, information passed through the program's
+source base in obscure ways.
+To understand
+.Pn comp ,
+one needed to understand the inner workings of
+.Fn uip/annosbr.c
+first.
+To be sure, to fully understand a program, its whole source code needs
+to be examined.
+Otherwise it would be a leap of faith, assuming that the developers
+have avoided obscure programming techniques.
+By separating the tools on the program-level, the boundaries are
+clearly visible and technically enforced.
+The interfaces are calls to
+.Fu exec()
+rather than arbitrary function calls.
+In order to understand
+.Pn comp ,
+it is no more necessary to read
+.Fn uip/sendsbr.c .
+In mmh,
+.Pn comp
+does no longer send messages.
+In nmh, there surely is
+.Pn send ,
+but
+.Pn comp
+\&... and
+.Pn repl
+and
+.Pn forw
+and
+.Pn dist
+and
+.Pn whatnow
+and
+.Pn viamail
+(!) ... all have the same message sending function included.
+The clear separation on the surface \(en the toolchest approach \(en
+it is violated on the level below.
+This violation is for the sake of time performance.
+On systems where
+.Fu fork()
+and
+.Fu exec()
+are expensive, the quicker response might be noticable.
+In the old times, sacrifying readability and conceptional beauty for speed
+might even have been necessary to prevent MH from being unusably slow.
+Whatever the reasons had been, today they are gone.
+No longer should we sacrifice readability and conceptional beauty.
+No longer should we violate the Unix philosophy's ``one tool, one job''
+guideline.
+No longer should we keep speed improvements that are unnecessary today.
+.P
+In mmh, the different jobs are divided among separate programs that
+invoke each other as needed.
+The clear separation on the surface is still visible on the level below.
+