# HG changeset patch # User markus schnalke # Date 1341061204 -7200 # Node ID 740f4128dea7d803a76cd56b15073f5f7569d883 # Parent c234656329e0bb24d35bc6ada049e001b9e9b2ea Reworked and extended the text about Modularization. diff -r c234656329e0 -r 740f4128dea7 discussion.roff --- a/discussion.roff Fri Jun 29 22:51:25 2012 +0200 +++ b/discussion.roff Sat Jun 30 15:00:04 2012 +0200 @@ -2820,15 +2820,17 @@ .Fu snprintf() function, for instance, was standardized with C99 and is available almost everywhere because of its high usefulness. -In mmh, the included implementation of +In project's own implementation of .Fu snprintf() -was dropped in favor for using the one of the standard library. -Such decisions could limit the portability of mmh, +was dropped in March 2012 in favor for using the one of the +standard library. +.Ci 0052f1024deb0a0a2fc2e5bacf93d45a5a9c9b32 +Such decisions limit the portability of mmh if systems don't support these standardized and widespread functions. -This compromise is made. +This compromise is made because mmh focuses on the future. .P -I am not even thirty years old and my C and Unix experience comprises -only five to seven years. +I am not yet thirty years old and my C and Unix experience comprises +only half a dozen years. Hence, I need to learn about the history in retrospective. I have not used those ancient constructs myself. I have not suffered from their incompatibilities. @@ -2837,29 +2839,40 @@ were well established already. I have only read a lot of books about the (good) old times. This puts me in a difficult positions when working with old code. -I needed to acquire knowledge about old code constructs and ancient -programming styles. -Other know these things by heart from their own experience. +I need to freshly acquire knowledge about old code constructs and ancient +programming styles, whereas older programmers know these things by +heart from their own experience. .P -Being aware of these facts, I rather let people with more historic -experience solve the task of replacing ancient code constructs -with standardized ones. +Being aware of the situation, I rather let people with more historic +experience replace ancient code constructs with standardized ones. Lyndon Nerenberg covered large parts of this task for the nmh project. He converted project-specific functions to POSIX replacements, also removing the conditionals compilation of now standardized features. +Ken Hornstein and David Levine had their part in the work, too. Often, I only needed to pull over changes from nmh into mmh. These changes include many commits; these are among them: .Ci 768b5edd9623b7238e12ec8dfc409b82a1ed9e2d .Ci 0052f1024deb0a0a2fc2e5bacf93d45a5a9c9b32 . - .P -During my own work, I have replaced the +During my own work, I tidied up the \fIMH standard library\fP, +.Fn libmh.a , +which is located in the +.Fn sbr +(``subroutines'') directory in the source tree. +The MH library includes functions that mmh tools usually need. +Among them are MH-specific functions for profile, context, sequence, +and folder handling, but as well +MH-independent functions, such as auxiliary string functions, +portability interfaces and error-checking wrappers for critical +functions of the standard library. +.P +I have replaced the .Fu atooi() function with calls to -.Fu strtoul() , +.Fu strtoul() with the third parameter \(en the base \(en set to eight. .Fu strtoul() -is part of C89 and thus safe to use. +is part of C89 and thus considered safe to use. .Ci c490c51b3c0f8871b6953bd0c74551404f840a74 .P I did remove project-included fallback implementations of @@ -2879,19 +2892,20 @@ In contrast to .Fu strcpy() , it returns a pointer to the terminating null-byte in the destination area. +The code was adjusted to replace .Fu copy() -was replaced with +with .Fu strcpy() , except within .Fu concat() , where .Fu copy() -is more convenient. -Of course, +was more convenient. +Therefore, the definition of .Fu copy() -is locally defined in the source file of +was moved into the source file of .Fu concat() -now. +and its visibility is now limited to it. .Ci 552fd7253e5ee9e554c5c7a8248a6322aa4363bb .P The function @@ -2909,14 +2923,14 @@ became desirable. Unfortunately, many of the 54 calls to .Fu r1bindex() -depended on its special behavior, +depended on a special behavior, which differed from the POSIX specification for .Fu basename() . Hence, .Fu r1bindex() was kept but renamed to -.Fu mhbasename() -and its second argument, the delimiter, was fixed to the slash. +.Fu mhbasename() , +fixing the delimiter to the slash. .Ci 240013872c392fe644bd4f79382d9f5314b4ea60 For possible uses of .Fu r1bindex() @@ -2947,6 +2961,9 @@ * If yes, then return 1, else return 0. */ VE +Two months later, it was completely removed by replacing it with +.Fu strncmp() . +.Ci b0b1dd37ff515578cf7cba51625189eb34a196cb @@ -2954,53 +2971,34 @@ .H2 "Modularization .P -Mmh's code base is split into two directories, -.Fn sbr -(``subroutines'') -and +The source code of the mmh tools is located in the .Fn uip -(``user interface programs''). -The directory -.Fn sbr -contains the sources of the \fIMH library\fP -.Fn libmh.a . -It includes functions that mmh tools usually need. -Among them are MH-specific functions for profile, context, sequence, -and folder handling, but as well -MH-independent functions, such as advanced string processing functions, -portability interfaces and error-checking wrappers for critical -functions of the standard library. -.P -The MH library is a standard library for the source files in the -.Fn uip -directory. -There reside the sources of the programs of the mmh toolchest. -Each tools has a source file with the name name. +(``user interface programs'') directory. +Each tools has a source file with the same name. For example, .Pn rmm is built from .Fn uip/rmm.c . -Some source files are used by multiple programs. +Some source files are used for multiple programs. For example .Fn uip/scansbr.c -is used by both, +is used for both, .Pn scan and .Pn inc . In nmh, 49 tools were built from 76 source files. -That is a ratio of 1.6 source files per program. -17 programs depended on the equally named source file only. -32 programs depended on multiple source files. +This is a ratio of 1.6 source files per program. +32 programs depended on multiple source files; +17 programs depended on one source file only. In mmh, 39 tools are built from 51 source files. -That is a ratio of 1.3 source files per program. -21 programs depended on the equally named source file only. -18 programs depended on multiple source files. -The MH library as well as shell scripts and multiple names to the -same program were ignored. +This is a ratio of 1.3 source files per program. +18 programs depend on multiple source files; +21 programs depend on one source file only. +(These numbers and the ones in the following text ignore the MH library +as well as shell scripts and multiple names for the same program.) .P -Splitting the source code of one program into multiple files can +Splitting the source code of a large program into multiple files can increase the readability of its source code. -This applies primary to complex programs. Most of the mmh tools, however, are simple and staight-forward programs. With the exception of the MIME handling tools, .Pn pick @@ -3014,12 +3012,12 @@ etc.) are larger. Splitting programs with less than 1\|000 lines of code into multiple -source files leads seldom to better readability. -The such tools, splitting makes sense, +source files seldom leads to better readability. +For such tools, splitting makes sense when parts of the code are reused in other programs, and the reused code fragment is not general enough for including it in the MH library, -or, if has depencencies on a library that only few programs need. +or, if the code has depencencies on a library that only few programs need. .Fn uip/packsbr.c , for instance, provides the core program logic for the .Pn packf @@ -3031,6 +3029,14 @@ .Fn uip/rcvpack.c mainly wrap the core function appropriately. No other tools use the folder packing functions. +As another example, +.Fn uip/termsbr.c +provides termcap support, which requires linking with a termcap or +curses library. +Including +.Fn uip/termsbr.c +into the MH library would require every program to be linked with +termcap or curses, although only few of the programs require it. .P The task of MIME handling is complex enough that splitting its code into multiple source files improves the readability. @@ -3040,14 +3046,15 @@ lines of code in summary. The main code file .Fn uip/mhstore.c -consists of 800 lines; the rest is reused in the other MIME handling tools. -It might be worthwhile to bundle the generic MIME handling code into -a MH-MIME library, in resemblence of the MH standard library. +consists of 800 lines; the other 1\|700 lines of code are reused in +other MIME handling tools. +It seems to be worthwhile to bundle the generic MIME handling code into +a MH-MIME library, as a companion to the MH standard library. This is left open for the future. .P -The work already done focussed on the non-MIME tools. +The work already done, focussed on the non-MIME tools. The amount of code compiled into each program was reduced. -This eased the understanding of the code base. +This eases the understanding of the code base. In nmh, .Pn comp was built from six source files: @@ -3062,15 +3069,16 @@ .Fn comp.c and .Fn whatnowproc.c . -Instead of invoking the +In nmh's +.Pn comp , +the core function of .Pn whatnow , .Pn send , and .Pn anno -programs -their core function was compiled into nmh's +were compiled into .Pn comp . -This saved the need to +This saved the need to execute these programs with .Fu fork() and .Fu exec() , @@ -3084,11 +3092,14 @@ included the function .Fu annotate() . Each program that wanted to annotate messages, included the source file -.Fn uip/annosbr.c . -The programs called -.Fu annotate() , -which required seven parameters, reflecting the command line switches of -.Pn anno . +.Fn uip/annosbr.c +and called +.Fu annotate() . +Because the function +.Fu annotate() +was used like the tool +.Pn anno , +it had seven parameters, reflecting the command line switches of the tool. When another pair of command line switches was added to .Pn anno , a rather ugly hack was implemented to avoid adding another parameter @@ -3105,21 +3116,27 @@ .Fn uip/comp.c and .Fn uip/whatnowproc.c , -together 210 lines of code, -the standard libraries excluded. +together 210 lines of code. In nmh, .Pn comp comprises six files with 2\|450 lines. -Of course, not all of the code in these six files was actually used by +Not all of the code in these six files was actually used by .Pn comp , -but the code reader needs to understand the code first to know which. +but the code reader needed to read all of the code first to know which +parts were used. .P -As I have read a lot in the code base during the last two years to -understand it, I learned about the easy and the difficult parts. -The smaller the influenced code area is, the stricter the boundaries -are defined, and the more straight-forward the code is written, -the easier is it to be understood. -Reading the +As I have read a lot in the code base during the last two years, +I learned about the easy and the difficult parts. +Code is easy to understand if: +.BU +The influenced code area is small +.BU +The boundaries are stictly defined +.BU +The code is written straight-forward +.P +.\" XXX move this paragraph somewhere else? +Reading .Pn rmm 's source code in .Fn uip/rmm.c @@ -3137,36 +3154,42 @@ Understanding .Pn comp requires to read 210 lines of code in mmh, but ten times as much in nmh. -In the aforementioned hack in +Due to the aforementioned hack in .Pn anno to save the additional parameter, information passed through the program's source base in obscure ways. -To understand +Thus, understanding .Pn comp , -one needed to understand the inner workings of +required understanding the inner workings of .Fn uip/annosbr.c first. -To be sure, to fully understand a program, its whole source code needs +To be sure to fully understand a program, its whole source code needs to be examined. -Otherwise it would be a leap of faith, assuming that the developers +Not doing so is a leap of faith, assuming that the developers have avoided obscure programming techniques. By separating the tools on the program-level, the boundaries are clearly visible and technically enforced. The interfaces are calls to .Fu exec() rather than arbitrary function calls. -In order to understand -.Pn comp , -it is no more necessary to read -.Fn uip/sendsbr.c . -In mmh, +.P +But the real problem is another: +Nmh violates the golden ``one tool, one job'' rule of the Unix philosophy. +Understanding .Pn comp -does no longer send messages. -In nmh, there surely is +requires understanding +.Fn uip/annosbr.c +and +.Fn uip/sendsbr.c +because +.Pn comp +does annotate and send messages. +In nmh, there surely exists the tool .Pn send , -but +which does (almost) only send messages. +But .Pn comp -\&... and +and .Pn repl and .Pn forw @@ -3175,10 +3198,20 @@ and .Pn whatnow and -.Pn viamail -(!) ... all have the same message sending function included. +.Pn viamail , +they all (!) have the same message sending function included, too. +In result, +.Pn comp +sends messages without using +.Pn send . +The situation is the same as if +.Pn grep +would page without +.Pn more +just because both programs are part of the same code base. +.P The clear separation on the surface \(en the toolchest approach \(en -it is violated on the level below. +is violated on the level below. This violation is for the sake of time performance. On systems where .Fu fork() @@ -3186,16 +3219,44 @@ .Fu exec() are expensive, the quicker response might be noticable. In the old times, sacrifying readability and conceptional beauty for speed -might even have been necessary to prevent MH from being unusably slow. +might even have been a must to prevent MH from being unusably slow. Whatever the reasons had been, today they are gone. -No longer should we sacrifice readability and conceptional beauty. +No longer should we sacrifice readability or conceptional beauty. No longer should we violate the Unix philosophy's ``one tool, one job'' guideline. -No longer should we keep speed improvements that are unnecessary today. +No longer should we keep speed improvements that became unnecessary. .P -In mmh, the different jobs are divided among separate programs that +Therefore, mmh's +.Pn comp +does no longer send messages. +In mmh, different jobs are divided among separate programs that invoke each other as needed. -The clear separation on the surface is still visible on the level below. +In consequence, +.Pn comp +invokes +.Pn whatnow +which thereafter invokes +.Pn send . +The clear separation on the surface is maintained on the level below. +Human users and the tools use the same interface \(en +annotations, for example, are made by invoking +.Pn anno , +no matter if requested by programs or by human beings. +The decrease of tools built from multiple source files and thus +the decrease of +.Fn uip/*sbr.c +files confirm the improvement. +.P +One disadvantage needs to be taken with this change: +The compiler can no longer check the integrity of the interfaces. +By changing the command line interfaces of tools, it is +the developer's job to adjust the invocations of these tools as well. +As this is a manual task and regression tests, which could detect such +problems, are not availabe yet, it is prone to errors. +These errors will not be detected at compile time but at run time. +Installing regression tests is a task left to do. +In the best case, a uniform way of invoking tools from other tools +can be developed to allow automated testing at compile time.