# HG changeset patch # User markus schnalke # Date 1342300767 -7200 # Node ID 8c537982d718c61f65873c78c4d7f8fbcccc4946 # Parent 8be9e84be526108dabfd87c781be7f55589e96d8 Further improvements of the discussion chapter. There's a bit more to come ... tomorrow. diff -r 8be9e84be526 -r 8c537982d718 discussion.roff --- a/discussion.roff Sat Jul 14 18:05:19 2012 +0200 +++ b/discussion.roff Sat Jul 14 23:19:27 2012 +0200 @@ -2879,21 +2879,18 @@ .H1 "Styling .P Kernighan and Pike have emphasized the importance of style in the -preface of their book: +preface of \fPThe Practice of Programming\fP: .[ [ kernighan pike practice of programming .], p. x] .QS Chapter 1 discusses programming style. -Good style is so important to good programming that we have chose +Good style is so important to good programming that we have chosen to cover it first. .QE This section covers changes in mmh that were guided by the desire to improve on style. -Many of them follow the rules given in the quoted book. -.[ -kernighan pike practice of programming -.] +Many of them follow the advice given in the quoted book. @@ -2903,12 +2900,11 @@ .P .U3 "Indentation Style .P -Indentation styles are the holy cow of programmers. -Kernighan and Pike +Indentation styles are the holy cow of programming. +Kernighan and Pike write: .[ [ kernighan pike practice of programming .], p. 10] -wrote: .QS Programmers have always argued about the layout of programs, but the specific style is much less important than its consistent @@ -2920,30 +2916,32 @@ I agree that the constant application is most important, but I believe that some styles have advantages over others. For instance the indentation with tab characters only. -Tab characters directly map to the nesting level \(en +The number of tabs corresponds to the nesting level \(en one tab, one level. -Tab characters are flexible because developers can adjust them to -whatever width they like to have. -There is no more need to run -.Pn unexpand -or -.Pn entab -programs to ensure the correct mixture of leading tabs and spaces. -The simple rules are: (1) Leading whitespace must consist of tabs only. -(2) Any other whitespace should consist of spaces. -These two rules ensure the integrity of the visual appearance. +Tab characters provide flexible visual appearance because developers +can adjust their width as prefered. +There is no more need to check for the correct mixture of +tabs and spaces. +Two simple rules ensure the integrity and flexibility of the visual +appearance: +.LI 1 +Leading whitespace must consist of tabs only. +.LI 2 +All other whitespace should be spaces. +.LP Although reformatting existing code should be avoided, I did it. I did not waste time arguing; I just reformatted the code. .Ci a485ed478abbd599d8c9aab48934e7a26733ecb1 .U3 "Comments .P -Section 1.6 of +Kernighan and Pike demand: ``Don't belabor the obvious''. .[ [ kernighan pike practice of programming .], p. 23] -demands: ``Don't belabor the obvious.'' -Hence, I simply removed all the comments in the following code excerpt: +Following the advice, I removed unnecessary comments. +For instance, I removed all comments in the following code excerpt +.Ci 426543622b377fc5d091455cba685e114b6df674 : .VS context_replace(curfolder, folder); /* update current folder */ seq_setcur(mp, mp->lowsel); /* update current message */ @@ -2961,14 +2959,15 @@ /* NUL-terminate the field */ *cp = '\0'; VE -.Ci 426543622b377fc5d091455cba685e114b6df674 .P -The program code explains enough itself, already. +The information in each of the comments was present in the code +statements already, except for the NUL-termination, which became +obvious from the context. .U3 "Names .P -Kernighan and Pike suggest: +Regarding this topic, Kernighan and Pike suggest: ``Use active names for functions''. .[ [ kernighan pike practice of programming @@ -2976,16 +2975,20 @@ One application of this rule was the rename of .Fu check_charset() to -.Fu is_native_charset() . -.Ci 8d77b48284c58c135a6b2787e721597346ab056d -The same change fixed a violation of ``Be accurate'' +.Fu is_native_charset() +.Ci 8d77b48284c58c135a6b2787e721597346ab056d . +The same change additionally fixed a violation of ``Be accurate'', .[ [ kernighan pike practice of programming .], p. 4] -as well. -The code did not match the expectation the function suggested, -as it, for whatever reason, only compared the first ten characters -of the charset name. +as the code did not match the expectation the function suggested. +It did not compare charset names but prefixes of them only. +In case the native charset was `ISO-8859-1', then +.VS +check_charset("ISO-8859-11", strlen("ISO-8859-11")) +VE +had returned true although the upper halves of the code pages +are different. .P More important than using active names is using descriptive names. .VS @@ -2993,8 +2996,8 @@ VE Renaming the obscure .Fu m_unknown() -function was a delightful event, although it made the code less funny. -.Ci 611d68d19204d7cbf5bd585391249cb5bafca846 +function was a delightful event, although it made the code less funny +.Ci 611d68d19204d7cbf5bd585391249cb5bafca846 . .P Magic numbers are generally considered bad style. Obviously, Kernighan and Pike agree: @@ -3002,57 +3005,70 @@ .[ [ kernighan pike practice of programming .], p. 19] -One such change was naming the type of input \(en mbox or mail folder \(en -to be scanned: +.P +The argument +.CW outnum +of the function +.Fu scan() +in +.Fn uip/scansbr.c +holds the number of the message to be created. +As well it encodes program logic with negative numbers and zero. +This led to obscure code. +I clarified the code by introducing two variables that extracted +the hidden information: +.VS +int incing = (outnum > 0); +int ismbox = (outnum != 0); +VE +The readable names are thus used in conditions; +the variable +.CW outnum +is used only to extract ordinary message numbers +.Ci b8b075c77be7794f3ae9ff0e8cedb12b48fd139f . +.P +Through the clarity improvement of the change detours in the program +logic of related code parts became apparent. +The implementation was simplified. +This possibility to improve had been invisible before +.Ci aa60b0ab5e804f8befa890c0a6df0e3143ce0723 . +.P +The names just described were a first step, yet the situation +was further improved by giving names to the magic values of +.CW outnum : .VS #define SCN_MBOX (-1) #define SCN_FOLD 0 VE +The two variables were updated thereafter as well: +.VS +int incing = (outnum != SCN_MBOX && outnum != SCN_FOLD); +int scanfolder = (outnum == SCN_FOLD); +VE +Furthermore, +.CW ismbox +was replaced by +.CW scanfolder +because that matched better to the program logic. .Ci 7ffb36d28e517a6f3a10272056fc127592ab1c19 -.P -The argument -.Ar outnum -of the function -.Fu scan() -in -.Fn uip/scansbr.c -defines the number of the message to be created. -If no message is to be created, the argument is misused to transport -program logic. -This lead to obscure code. -I improved the clarity of the code by introducing two variables: -.VS -int incing = (outnum > 0); -int ismbox = (outnum != 0); -VE -They cover the magic values and are used for conditions. -The variable -.Ar outnum -is only used when it holds an ordinary message number. -.Ci b8b075c77be7794f3ae9ff0e8cedb12b48fd139f -The clarity improvement of the change showed detours in the program logic -of related code parts. -Having the new variables with descriptive names, a more -straight forward implementation became apparent. -Before the code was clarified, the possibility to improve had not be seen. -.Ci aa60b0ab5e804f8befa890c0a6df0e3143ce0723 + .H2 "Structural Rework .P -Although the stylistic changes described up to here improve the -readability of the source code, all of them are changes ``in the small''. -Structural changes affect a much larger area. -They are more difficult to do but lead to larger improvements, -especially as they influence the outer shape of the tools as well. +Although the stylistic changes described already improve the +readability of the source code, all of them were changes ``in the small''. +Structural changes, in contrast, affect much larger code areas. +They are more difficult to accomplish but lead to larger improvements, +especially as they often influence the outer shape of the tools as well. .P At the end of their chapter on style, Kernighan and Pike ask: ``But why worry about style?'' .[ [ kernighan pike practice of programming -.], p. 28] -Following are two examples of structural rework that show +.], p. 28]. +Following are two examples of structural rework that demonstrate why style is important in the first place. @@ -3060,11 +3076,11 @@ .P Until 2002, .Pn anno -had six functional command line switches, +had six functional command line switches: .Sw -component and .Sw -text , -which have an argument each, +each with an argument, and the two pairs of flags, .Sw -[no]date and @@ -3079,26 +3095,26 @@ .Sw -append , and .Sw -number , -the last one taking an argument. -.Ci 7480dbc14bc90f2d872d434205c0784704213252 +the last one taking an argument +.Ci 7480dbc14bc90f2d872d434205c0784704213252 . Later, .Sw -[no]preserve -was added. -.Ci d9b1d57351d104d7ec1a5621f090657dcce8cb7f +was added as well +.Ci d9b1d57351d104d7ec1a5621f090657dcce8cb7f . Then, the Synopsis section of the man page .Mp anno (1) read: .VS -anno [+folder] [msgs] [-component field] [-inplace | -noinplace] +anno [+folder] [msgs] [-component f(CIfieldfP] [-inplace | -noinplace] [-date | -nodate] [-draft] [-append] [-list] [-delete] - [-number [num|all]] [-preserve | -nopreserve] [-version] - [-help] [-text body] + [-number [f(CInumfP|fPallfP]] [-preserve | -nopreserve] [-version] + [-help] [-text f(CIbodyfP] VE .LP The implementation followed the same structure. Problems became visible when .Cl "anno -list -number 42 -worked on the current message instead on message number 42, +worked on the current message instead of on message number 42, and .Cl "anno -list -number l:5 did not work on the last five messages but failed with the mysterious @@ -3122,59 +3138,51 @@ .QE .LP The problem was manifold. -The code required a numeric argument to the -.Sw -number -switch. -If it was missing or non-numeric, -.Pn anno -aborted with an error message that had an off-by-one error, -printing the switch one before the failing one. Semantically, the argument to the .Sw -number switch is only necessary in combination with .Sw -delete , but not with .Sw -list . +The code, however, required a numeric argument in any case. +If the argument was missing or non-numeric, +.Pn anno +aborted with an error message that additionally had an off-by-one error. +It printed the name of the switch one before the concerned one. .P -Trying to fix these problems on the surface would not have solved -them truly, as they originate from a discrepance between the +Trying to fix these problems on the surface would not have solved them. +They originate from a discrepance between the structure of the problem and the structure implemented in the program. -Such structural differences can not be cured on the surface. -They need to be solved by adjusting the structure of the implementation -to the structure of the problem. +Such structural differences can only be solved by adjusting the +structure of the implementation to the structure of the problem. .P -In 2002, the new switches +Steinhart had added the new .Sw -list and .Sw -delete -were added in the same way, the -.Sw -number -switch for instance had been added. -Yet, they are of structural different type. +switches in a style similar to the other switches though +they are of structural different type. Semantically, .Sw -list and .Sw -delete -introduce modes of operation. +introduce operation modes. Historically, .Pn anno had only one operation mode: adding header fields. -With the extension it got two more modes: -.\" XXX got +With the extension, two more modes were added: listing and deleting header fields. The structure of the code changes did not pay respect to this -fundamental change to -.Pn anno 's -behavior. +fundamental change. Neither the implementation nor the documentation did clearly -define them as being exclusive modes of operation. +declare the exclusive operation modes as such. Having identified the problem, I solved it by putting structure into .Pn anno -and its documentation. -.Ci d54c8db8bdf01e8381890f7729bc0ef4a055ea11 +and its documentation +.Ci d54c8db8bdf01e8381890f7729bc0ef4a055ea11 . .P The difference is visible in both the code and the documentation. -The following code excerpt: +For instance in the following code excerpt: .VS int delete = -2; /* delete header element if set */ int list = 0; /* list header elements if set */ @@ -3187,7 +3195,7 @@ continue; VE .LP -was replaced by: +which was replaced by: .VS static enum { MODE_ADD, MODE_DEL, MODE_LIST } mode = MODE_ADD; [...] @@ -3203,20 +3211,19 @@ it is easier to understand as well. The same applies to the documentation. The man page was completely reorganized to propagate the same structure. -This is visible in the Synopsis section: +This is already visible in the Synopsis section: .VS -anno [+folder] [msgs] [-component field] [-text body] +anno [+folder] [msgs] [-component f(CIfieldfP] [-text fPbodyfP] [-append] [-date | -nodate] [-preserve | -nopreserve] [-Version] [-help] -anno -delete [+folder] [msgs] [-component field] [-text - body] [-number num | all ] [-preserve | -nopreserve] +anno -delete [+folder] [msgs] [-component fPfieldfP] [-text + fPbodyfP] [-number fPnum fP| fPall fP] [-preserve | -nopreserve] [-Version] [-help] -anno -list [+folder] [msgs] [-component field] [-number] +anno -list [+folder] [msgs] [-component fPfieldfP] [-number] [-Version] [-help] VE -.\" XXX think about explaining the -preserve rework? @@ -3231,21 +3238,20 @@ .Fn ./foo/bar . .LI 3 Absolute MH folder paths, like -.Fn +friends/phil . +.Fn +projects/mmh . .LI 4 Relative MH folder paths, like .Fn @subfolder . .LP -The last type, relative MH folder paths, are hardly documented. -Nonetheless, they are useful for large mail storages. +Relative MH folder paths, are hardly documented +although they are useful for large mail storages. The current mail folder is specified as `\c .Fn @ ', just like the current directory is specified as `\c .Fn . '. .P To allow MH tools to understand all four notations, -they need to convert between them. -.\" XXX between? +they need to be able to convert between them. In nmh, these path name conversion functions were located in the files .Fn sbr/path.c (``return a pathname'') and @@ -3253,19 +3259,23 @@ (``get the path for the mail directory''). The seven functions in the two files were documented with no more than two comments, which described obvious information. -The function signatures were neither explaining: -.VS -char *path(char *, int); -char *pluspath(char *); -char *m_mailpath(char *); -char *m_maildir(char *); -VE +The signatures of the four exported functions did not explain their +semantics: +.LI 1 +.CW "char *path(char *, int); +.LI 2 +.CW "char *pluspath(char *); +.LI 3 +.CW "char *m_mailpath(char *); +.LI 4 +.CW "char *m_maildir(char *); .P -My investigation provides the following description: +My investigations provided the following descriptions: .LI 1 The second parameter of .Fu path() -defines the type of path given as first parameter. +defines the type as which the path given in the first parameter should +be treated. Directory paths are converted to absolute directory paths. Folder paths are converted to absolute folder paths. Folder paths must not include a leading `\fL@\fP' character. @@ -3284,41 +3294,41 @@ The characters `\fL+\fP' or `\fL@\fP' at the beginning of the path name are treated literal, i.e. as the first character of a relative directory path. Hence, this function can not be used for folder paths. -In any case, the result is an absolute directory path. -The result is a pointer to newly allocated memory. +In any case, the result is an absolute directory path, +returned as a pointer to newly allocated memory. .LI 4 .Fu m_maildir() returns the parameter unchanged if it is an absolute directory path or begins with the entry `\fL.\fP' or `\fL..\fP'. All other strings are prepended with the current working directory. -Hence, this functions can not be used for folder paths. +Hence, this function can not be used for folder paths. The result is either an absolute directory path or a relative -directory path, starting with a dot. +directory path, starting with dot or dot-dot. In contrast to the other functions, the result is a pointer to static memory. .P The situation was obscure, irritating, error-prone, and non-orthogonal. -No clear terminology was used to name the different kinds of path names. -The first argument of +Additionally, no clear terminology was used to name the different +kinds of path names. +Sometimes, the names were even misleading, much as the first argument of .Fu m_mailpath() , -for instance, was named -.Ar folder , -though +which was named +.CW folder , +although .Fu m_mailpath() -can not be used for MH folders. +could not be used with MH folder arguments. .P -I reworked the path name conversion completely, introducing clarity. +I clarified the path name conversion by complete rework. First of all, the terminology needed to be defined. A path name is either in the Unix domain, then it is called -\fIdirectory path\fP, `dirpath' for short, or it is in the MH domain, -then it is called \fIfolder path\fP, `folpath' for short. +\fIdirectory path\fP (\fIdirpath\fP for short) or it is in the MH domain, +then it is called \fIfolder path\fP (\fIfolpath\fP for short). The two terms need to be used with strict distinction. -Having a clear terminology is often an indicator of having understood -the problem itself. +Often a clear terminology indicates that the problem is understood. Second, I exploited the concept of path type indicators. -By requesting every path name to start with a clear type identifier, -conversion between the types can be fully automated. -Thus the tools can accept paths of any type from the user. +By requiring every path name to start with a distinct type identifier, +the conversion between the types could be fully automated. +This allows the tools to accept paths of any type from the user. Therefore, it was necessary to require relative directory paths to be prefixed with a dot character. In consequence, the dot character could no longer be an alias for the @@ -3327,8 +3337,7 @@ Third, I created three new functions to replace the previous mess: .LI 1 .Fu expandfol() -converts folder paths to absolute folder paths, -without the leading plus character. +converts folder paths to absolute folder paths. Directory paths are simply passed through. This function is to be used for folder paths only, thus the name. The result is a pointer to static memory. @@ -3353,10 +3362,11 @@ The third function converts any path name type to the most general one, the absolute directory path. All of the functions return pointers to static memory. -All three functions are implemented in -.Fn sbr/path.c . +The file +.Fn sbr/path.c +contains the implementation of the functions; .Fn sbr/m_maildir.c -is removed. +was removed. .Ci d39e2c447b0d163a5a63f480b23d06edb7a73aa0 .P Along with the path conversion rework, I also replaced @@ -3367,31 +3377,30 @@ .Fu getfolder(FCUR) with .Fu getcurfol() , -which is only a convenience wrapper for -.Fu expandfol("@") . +which only wraps +.Fu expandfol(""@"") +for convenience. This code was moved from .Fn sbr/getfolder.c -to -.Fn sbr/path.c . +into +.Fn sbr/path.c +as well. .Ci d39e2c447b0d163a5a63f480b23d06edb7a73aa0 .P The related function .Fu etcpath() -was moved to +is now included in .Fn sbr/path.c , too .Ci b4c29794c12099556151d93a860ee51badae2e35 . Previously, it had been located in -.Fn config/config.c , -for whatever reasons. +.Fn config/config.c . .P +Now, .Fn sbr/path.c -now contains all path handling code. -.\" XXX naechste zeile weg? -Only 173 lines of code were needed to replace the previous 252 lines. -The readability of the code is highly improved. -Additionally, each of the six exported and one static functions -is introduced by an explaining comment. +contains all path handling code. +Besides being less code, its readability is highly improved. +The functions follow a common style and are well documented.