comparison discussion.roff @ 131:7c741bc8f719

Reorganized: Converted 4-parted discussion into 3-parted discussion.
author markus schnalke <meillo@marmaro.de>
date Tue, 03 Jul 2012 11:11:12 +0200
parents 0b9aa74ced4d
children 02660c14f6a8
comparison
equal deleted inserted replaced
130:0b9aa74ced4d 131:7c741bc8f719
422 That does not hurt because 422 That does not hurt because
423 .Pn slocal 423 .Pn slocal
424 is unrelated to the rest of the project. 424 is unrelated to the rest of the project.
425 425
426 426
427 .H2 "\fLshow\fP and \fPmhshow\fP 427 .H3 "Profile Reading
428 .P
429 FIXME XXX
430
431 commit 3e017a7abbdf69bf0dff7a4073275961eda1ded8
432 Author: markus schnalke <meillo@marmaro.de>
433 Date: Wed Jun 27 14:23:35 2012 +0200
434
435 spost: Read profile and context now. Removed -library switch.
436 spost is a full part of the mmh toolchest, hence, it shall read the
437 profile/context. This will remove the need to pass profile information
438 from send to spost via command line switches.
439 In January 2012, there had been a discussion on the nmh-workers ML
440 whether post should read the profile/context. There wasn't a clear
441 answer. It behavior was mainly motivated by the historic situation,
442 it seems. My opinion on the topic goes into the direction that every
443 tool that is part of the mmh toolchest should read the profile. That
444 is a clear and simple concept. Using MH tools without wanting to
445 interact with MH (like mhmail had been) is no more a practical problem.
446
447 commit 32d4f9daaa70519be3072479232ff7be0500d009
448 Author: markus schnalke <meillo@marmaro.de>
449 Date: Wed Jun 27 13:15:47 2012 +0200
450
451 mhmail: Read the context!
452 mhmail will change from a mailx-replacment to an alternative to
453 `comp -ed prompter', thus being a send front-end. Hence, mhmail
454 should not stay outside the profile/context respecting mmh toolchest.
455
456
457 slocal
458
459
460
461
462 .H2 "Displaying Messages
463 .P
464 FIXME XXX
465
466 .U3 "\fLshow\fP and \fPmhshow\fP
428 .P 467 .P
429 Since the very beginning \(en already in the first concept paper \(en 468 Since the very beginning \(en already in the first concept paper \(en
430 .Pn show 469 .Pn show
431 had been MH's message display program. 470 had been MH's message display program.
432 .Pn show 471 .Pn show
559 hurts in one regard: It had been such a simple program. 598 hurts in one regard: It had been such a simple program.
560 Its lean elegance is missing to the new 599 Its lean elegance is missing to the new
561 .Pn show . 600 .Pn show .
562 But there is no chance; 601 But there is no chance;
563 supporting MIME demands for higher essential complexity. 602 supporting MIME demands for higher essential complexity.
603
604
605 .U3 "Scan Listings
606 .P
607 FIXME XXX
608
609 .P
610
611 commit c20e315f9fb9f0f0955749726dbf4fd897cd9f48
612 Author: markus schnalke <meillo@marmaro.de>
613 Date: Fri Dec 9 21:56:44 2011 +0100
614
615 Adjusted the default scan listing: remove the body preview
616 The original listing is still available as etc/scan.nmh
617
618 commit 70b2643e0da8485174480c644ad9785c84f5bff4
619 Author: markus schnalke <meillo@marmaro.de>
620 Date: Mon Jan 30 16:16:26 2012 +0100
621
622 Scan listings shall not contain body content. Hence, removed this feature.
623 Scan listings shall operator on message headers and non-message information
624 only. Displaying the beginning of the body complicates everything too much.
625 That's no surprise, because it's something completely different. If you
626 want to examine the body, then use show(1)/mhshow(1).
627 Changed the default scan formats accordingly.
628
564 629
565 630
566 .H2 "Configure Options 631 .H2 "Configure Options
567 .P 632 .P
568 Customization is a double-edged sword. 633 Customization is a double-edged sword.
2472 Forwarding messages using MIME. 2537 Forwarding messages using MIME.
2473 .Ci 6e271608b7b9c23771523f88d23a4d3593010cf1 2538 .Ci 6e271608b7b9c23771523f88d23a4d3593010cf1
2474 2539
2475 2540
2476 2541
2477 2542 .H2 "Drafts and Trash Folder
2478 .H1 "Style 2543 .P
2479 .P 2544
2480 Kernighan and Pike have emphasized the importance of style in the 2545 .U3 "Draft Folder
2481 preface of their book:
2482 .[ [
2483 kernighan pike practice of programming
2484 .], p. x]
2485 .QS
2486 Chapter 1 discusses programming style.
2487 Good style is so important to good programming that we have chose
2488 to cover it first.
2489 .QE
2490 This section covers changes in mmh that were motivated by the desire
2491 to improve on style.
2492 Many of them follow the rules given in the quoted book.
2493 .[
2494 kernighan pike practice of programming
2495 .]
2496
2497
2498
2499
2500 .H2 "Code Style
2501 .P
2502 .U3 "Indentation Style
2503 .P
2504 Indentation styles are the holy cow of programmers.
2505 Again Kernighan and Pike:
2506 .[ [
2507 kernighan pike practice of programming
2508 .], p. 10]
2509 .QS
2510 Programmers have always argued about the layout of programs,
2511 but the specific style is much less important than its consistent
2512 application.
2513 Pick one style, preferably ours, use it consistently, and don't waste
2514 time arguing.
2515 .QE
2516 .P
2517 I agree that the constant application is most important,
2518 but I believe that some styles have advantages over others.
2519 For instance the indentation with tab characters only.
2520 Tab characters directly map to the nesting level \(en
2521 one tab, one level.
2522 Tab characters are flexible because developers can adjust them to
2523 whatever width they like to have.
2524 There is no more need to run
2525 .Pn unexpand
2526 or
2527 .Pn entab
2528 programs to ensure the correct mixture of leading tabs and spaces.
2529 The simple rules are: (1) Leading whitespace must consist of tabs only.
2530 (2) Any other whitespace should consist of spaces.
2531 These two rules ensure the integrity of the visual appearance.
2532 Although reformatting existing code should be avoided, I did it.
2533 I did not waste time arguing; I just did it.
2534 .Ci a485ed478abbd599d8c9aab48934e7a26733ecb1
2535
2536 .U3 "Comments
2537 .P
2538 Section 1.6 of
2539 .[ [
2540 kernighan pike practice of programming
2541 .], p. 23]
2542 demands: ``Don't belabor the obvious.''
2543 Hence, I simply removed all the comments in the following code excerpt:
2544 .VS
2545 context_replace(curfolder, folder); /* update current folder */
2546 seq_setcur(mp, mp->lowsel); /* update current message */
2547 seq_save(mp); /* synchronize message sequences */
2548 folder_free(mp); /* free folder/message structure */
2549 context_save(); /* save the context file */
2550
2551 [...]
2552
2553 int c; /* current character */
2554 char *cp; /* miscellaneous character pointer */
2555
2556 [...]
2557
2558 /* NUL-terminate the field */
2559 *cp = '\0';
2560 VE
2561 .Ci 426543622b377fc5d091455cba685e114b6df674
2562 .P
2563 The names of the functions explain enough already.
2564
2565 .U3 "Names
2566 .P
2567 Kernighan and Pike suggest:
2568 ``Use active names for functions''.
2569 .[ [
2570 kernighan pike practice of programming
2571 .], p. 4]
2572 One application of this rule was the rename of
2573 .Fu check_charset()
2574 to
2575 .Fu is_native_charset() .
2576 .Ci 8d77b48284c58c135a6b2787e721597346ab056d
2577 The same change fixed a violation of ``Be accurate'' as well.
2578 The code did not match the expectation the function suggested,
2579 as it, for whatever reason, only compared the first ten characters
2580 of the charset name.
2581 .P
2582 More important than using active names is using descriptive names.
2583 Renaming the obscure function
2584 .Fu m_unknown()
2585 was a delightful event.
2586 .Ci 611d68d19204d7cbf5bd585391249cb5bafca846
2587 .P
2588 Magic numbers are generally considered bad style.
2589 Obviously, Kernighan and Pike agree:
2590 ``Give names to magic numbers''.
2591 .[ [
2592 kernighan pike practice of programming
2593 .], p. 19]
2594 One such change was naming the type of input \(en mbox or mail folder \(en
2595 to be scanned:
2596 .VS
2597 #define SCN_MBOX (-1)
2598 #define SCN_FOLD 0
2599 VE
2600 .Ci 7ffb36d28e517a6f3a10272056fc127592ab1c19
2601 .P
2602 The argument
2603 .Ar outnum
2604 of the function
2605 .Fu scan()
2606 in
2607 .Fn uip/scansbr.c
2608 defines the number of the message to be created.
2609 If no message is to be created, the argument is misused to transport
2610 program logic.
2611 This lead to obscure code.
2612 I improved the clarity of the code by introducing two variables:
2613 .VS
2614 int incing = (outnum > 0);
2615 int ismbox = (outnum != 0);
2616 VE
2617 They cover the magic values and are used for conditions.
2618 The variable
2619 .Ar outnum
2620 is only used when it holds an ordinary message number.
2621 .Ci b8b075c77be7794f3ae9ff0e8cedb12b48fd139f
2622 The clarity improvement of the change showed detours in the program logic
2623 of related code parts.
2624 Having the new variables with descriptive names, a more
2625 straight forward implementation became apparent.
2626 Before the clarification was done,
2627 the possibility to improve had not be seen.
2628 .Ci aa60b0ab5e804f8befa890c0a6df0e3143ce0723
2629
2630 .U3 "Rework of \f(CWanno\fP
2631 .P
2632 At the end of their chapter on style,
2633 Kernighan and Pike ask: ``But why worry about style?''
2634 The following example of my rework of
2635 .Pn anno
2636 provides an answer why style is important in the first place.
2637 .P
2638 Until 2002,
2639 .Pn anno
2640 had six functional command line switches,
2641 .Sw -component
2642 and
2643 .Sw -text ,
2644 which took an argument each,
2645 and the two pairs of flags,
2646 .Sw -[no]date
2647 and
2648 .Sw -[no]inplace.,
2649 .Sw -component
2650 and
2651 .Sw -text ,
2652 which took an argument each,
2653 and the two pairs of flags,
2654 .Sw -[no]date
2655 and
2656 .Sw -[no]inplace .
2657 Then Jon Steinhart introduced his attachment system.
2658 In need for more advanced annotation handling, he extended
2659 .Pn anno .
2660 He added five more switches:
2661 .Sw -draft ,
2662 .Sw -list ,
2663 .Sw -delete ,
2664 .Sw -append ,
2665 and
2666 .Sw -number ,
2667 the last one taking an argument.
2668 .Ci 7480dbc14bc90f2d872d434205c0784704213252
2669 Later,
2670 .Sw -[no]preserve
2671 was added.
2672 .Ci d9b1d57351d104d7ec1a5621f090657dcce8cb7f
2673 Then, the Synopsis section of the man page
2674 .Mp anno (1)
2675 read:
2676 .VS
2677 anno [+folder] [msgs] [-component field] [-inplace | -noinplace]
2678 [-date | -nodate] [-draft] [-append] [-list] [-delete]
2679 [-number [num|all]] [-preserve | -nopreserve] [-version]
2680 [-help] [-text body]
2681 VE
2682 .LP
2683 The implementation followed the same structure.
2684 Problems became visible when
2685 .Cl "anno -list -number 42
2686 worked on the current message instead on message number 42,
2687 and
2688 .Cl "anno -list -number l:5
2689 did not work on the last five messages but failed with the mysterious
2690 error message: ``anno: missing argument to -list''.
2691 Yet, the invocation matched the specification in the man page.
2692 There, the correct use of
2693 .Sw -number
2694 was defined as being
2695 .Cl "[-number [num|all]]
2696 and the textual description for the combination with
2697 .Sw -list
2698 read:
2699 .QS
2700 The -list option produces a listing of the field bodies for
2701 header fields with names matching the specified component,
2702 one per line. The listing is numbered, starting at 1, if
2703 the -number option is also used.
2704 .QE
2705 .LP
2706 The problem was manifold.
2707 The code required a numeric argument to the
2708 .Sw -number
2709 switch.
2710 If it was missing or non-numeric,
2711 .Pn anno
2712 aborted with an error message that had an off-by-one error,
2713 printing the switch one before the failing one.
2714 Semantically, the argument to the
2715 .Sw -number
2716 switch is only necessary in combination with
2717 .Sw -delete ,
2718 but not with
2719 .Sw -list .
2720 In the former case it is even necessary.
2721 .P
2722 Trying to fix these problems on the surface would not have solved it truly.
2723 The problems discovered originate from a discrepance between the semantic
2724 structure of the problem and the structure implemented in the program.
2725 Such structural differences can not be cured on the surface.
2726 They need to be solved by adjusting the structure of the implementation
2727 to the structure of the problem.
2728 .P
2729 In 2002, the new switches
2730 .Sw -list
2731 and
2732 .Sw -delete
2733 were added in the same way, the
2734 .Sw -number
2735 switch for instance had been added.
2736 Yet, they are of structural different type.
2737 Semantically,
2738 .Sw -list
2739 and
2740 .Sw -delete
2741 introduce modes of operation.
2742 Historically,
2743 .Pn anno
2744 had only one operation mode: adding header fields.
2745 With the extension, it got two moder modes:
2746 listing and deleting header fields.
2747 The structure of the code changes did not pay respect to this
2748 fundamental change to
2749 .Pn anno 's
2750 behavior.
2751 Neither the implementation nor the documentation did clearly
2752 define them as being exclusive modes of operation.
2753 Having identified the problem, I solved it by putting structure into
2754 .Pn anno
2755 and its documentation.
2756 .Ci d54c8db8bdf01e8381890f7729bc0ef4a055ea11
2757 .P
2758 The difference is visible in both, the code and the documentation.
2759 The following code excerpt:
2760 .VS
2761 int delete = -2; /* delete header element if set */
2762 int list = 0; /* list header elements if set */
2763 [...]
2764 case DELETESW: /* delete annotations */
2765 delete = 0;
2766 continue;
2767 case LISTSW: /* produce a listing */
2768 list = 1;
2769 continue;
2770 VE
2771 .LP
2772 was replaced by:
2773 .VS
2774 static enum { MODE_ADD, MODE_DEL, MODE_LIST } mode = MODE_ADD;
2775 [...]
2776 case DELETESW: /* delete annotations */
2777 mode = MODE_DEL;
2778 continue;
2779 case LISTSW: /* produce a listing */
2780 mode = MODE_LIST;
2781 continue;
2782 VE
2783 .LP
2784 The replacement code does not only reflect the problem's structure better,
2785 it is easier to understand as well.
2786 The same applies to the documentation.
2787 The man page was completely reorganized to propagate the same structure.
2788 This is visible in the Synopsis section:
2789 .VS
2790 anno [+folder] [msgs] [-component field] [-text body]
2791 [-append] [-date | -nodate] [-preserve | -nopreserve]
2792 [-Version] [-help]
2793
2794 anno -delete [+folder] [msgs] [-component field] [-text
2795 body] [-number num | all ] [-preserve | -nopreserve]
2796 [-Version] [-help]
2797
2798 anno -list [+folder] [msgs] [-component field] [-number]
2799 [-Version] [-help]
2800 VE
2801 .\" XXX think about explaining the -preserve rework?
2802
2803
2804
2805
2806 .H2 "Standard Libraries
2807 .P
2808 MH is one decade older than the POSIX and ANSI C standards.
2809 Hence, MH included own implementations of functions
2810 that are standardized and thus widely available today,
2811 but were not back then.
2812 Today, twenty years after the POSIX and ANSI C were published,
2813 developers can expect system to comply with these standards.
2814 In consequence, MH-specific replacements for standard functions
2815 can and should be dropped.
2816 Kernighan and Pike advise: ``Use standard libraries.''
2817 .[ [
2818 kernighan pike practice of programming
2819 .], p. 196]
2820 Actually, MH had followed this advice in history,
2821 but it had not adjusted to the changes in this field.
2822 The
2823 .Fu snprintf()
2824 function, for instance, was standardized with C99 and is available
2825 almost everywhere because of its high usefulness.
2826 In project's own implementation of
2827 .Fu snprintf()
2828 was dropped in March 2012 in favor for using the one of the
2829 standard library.
2830 .Ci 0052f1024deb0a0a2fc2e5bacf93d45a5a9c9b32
2831 Such decisions limit the portability of mmh
2832 if systems don't support these standardized and widespread functions.
2833 This compromise is made because mmh focuses on the future.
2834 .P
2835 I am not yet thirty years old and my C and Unix experience comprises
2836 only half a dozen years.
2837 Hence, I need to learn about the history in retrospective.
2838 I have not used those ancient constructs myself.
2839 I have not suffered from their incompatibilities.
2840 I have not longed for standardization.
2841 All my programming experience is from a time when ANSI C and POSIX
2842 were well established already.
2843 I have only read a lot of books about the (good) old times.
2844 This puts me in a difficult positions when working with old code.
2845 I need to freshly acquire knowledge about old code constructs and ancient
2846 programming styles, whereas older programmers know these things by
2847 heart from their own experience.
2848 .P
2849 Being aware of the situation, I rather let people with more historic
2850 experience replace ancient code constructs with standardized ones.
2851 Lyndon Nerenberg covered large parts of this task for the nmh project.
2852 He converted project-specific functions to POSIX replacements,
2853 also removing the conditionals compilation of now standardized features.
2854 Ken Hornstein and David Levine had their part in the work, too.
2855 Often, I only needed to pull over changes from nmh into mmh.
2856 These changes include many commits; these are among them:
2857 .Ci 768b5edd9623b7238e12ec8dfc409b82a1ed9e2d
2858 .Ci 0052f1024deb0a0a2fc2e5bacf93d45a5a9c9b32 .
2859 .P
2860 During my own work, I tidied up the \fIMH standard library\fP,
2861 .Fn libmh.a ,
2862 which is located in the
2863 .Fn sbr
2864 (``subroutines'') directory in the source tree.
2865 The MH library includes functions that mmh tools usually need.
2866 Among them are MH-specific functions for profile, context, sequence,
2867 and folder handling, but as well
2868 MH-independent functions, such as auxiliary string functions,
2869 portability interfaces and error-checking wrappers for critical
2870 functions of the standard library.
2871 .P
2872 I have replaced the
2873 .Fu atooi()
2874 function with calls to
2875 .Fu strtoul()
2876 with the third parameter \(en the base \(en set to eight.
2877 .Fu strtoul()
2878 is part of C89 and thus considered safe to use.
2879 .Ci c490c51b3c0f8871b6953bd0c74551404f840a74
2880 .P
2881 I did remove project-included fallback implementations of
2882 .Fu memmove()
2883 and
2884 .Fu strerror() ,
2885 although Peter Maydell had re-included them into nmh in 2008
2886 to support SunOS 4.
2887 Nevertheless, these functions are part of ANSI C.
2888 Systems that do not even provide full ANSI C support should not
2889 put a load on mmh.
2890 .Ci b067ff5c465a5d243ce5a19e562085a9a1a97215
2891 .P
2892 The
2893 .Fu copy()
2894 function copies the string in argument one to the location in two.
2895 In contrast to
2896 .Fu strcpy() ,
2897 it returns a pointer to the terminating null-byte in the destination area.
2898 The code was adjusted to replace
2899 .Fu copy()
2900 with
2901 .Fu strcpy() ,
2902 except within
2903 .Fu concat() ,
2904 where
2905 .Fu copy()
2906 was more convenient.
2907 Therefore, the definition of
2908 .Fu copy()
2909 was moved into the source file of
2910 .Fu concat()
2911 and its visibility is now limited to it.
2912 .Ci 552fd7253e5ee9e554c5c7a8248a6322aa4363bb
2913 .P
2914 The function
2915 .Fu r1bindex()
2916 had been a generalized version of
2917 .Fu basename()
2918 with minor differences.
2919 As all calls to
2920 .Fu r1bindex()
2921 had the slash (`/') as delimiter anyway,
2922 replacing
2923 .Fu r1bindex()
2924 with the more specific and better-named function
2925 .Fu basename()
2926 became desirable.
2927 Unfortunately, many of the 54 calls to
2928 .Fu r1bindex()
2929 depended on a special behavior,
2930 which differed from the POSIX specification for
2931 .Fu basename() .
2932 Hence,
2933 .Fu r1bindex()
2934 was kept but renamed to
2935 .Fu mhbasename() ,
2936 fixing the delimiter to the slash.
2937 .Ci 240013872c392fe644bd4f79382d9f5314b4ea60
2938 For possible uses of
2939 .Fu r1bindex()
2940 with a different delimiter,
2941 the ANSI C function
2942 .Fu strrchr()
2943 provides the core functionality.
2944 .P
2945 The
2946 .Fu ssequal()
2947 function \(en apparently for ``substring equal'' \(en
2948 was renamed to
2949 .Fu isprefix() ,
2950 because this is what it actually checks.
2951 .Ci c20b4fa14515c7ab388ce35411d89a7a92300711
2952 Its source file had included the following comments, no joke.
2953 .VS
2954 /*
2955 * THIS CODE DOES NOT WORK AS ADVERTISED.
2956 * It is actually checking if s1 is a PREFIX of s2.
2957 * All calls to this function need to be checked to see
2958 * if that needs to be changed. Prefix checking is cheaper, so
2959 * should be kept if it's sufficient.
2960 */
2961
2962 /*
2963 * Check if s1 is a substring of s2.
2964 * If yes, then return 1, else return 0.
2965 */
2966 VE
2967 Two months later, it was completely removed by replacing it with
2968 .Fu strncmp() .
2969 .Ci b0b1dd37ff515578cf7cba51625189eb34a196cb
2970
2971
2972
2973
2974 .H2 "Modularization
2975 .P
2976 The source code of the mmh tools is located in the
2977 .Fn uip
2978 (``user interface programs'') directory.
2979 Each tools has a source file with the same name.
2980 For example,
2981 .Pn rmm
2982 is built from
2983 .Fn uip/rmm.c .
2984 Some source files are used for multiple programs.
2985 For example
2986 .Fn uip/scansbr.c
2987 is used for both,
2988 .Pn scan
2989 and
2990 .Pn inc .
2991 In nmh, 49 tools were built from 76 source files.
2992 This is a ratio of 1.6 source files per program.
2993 32 programs depended on multiple source files;
2994 17 programs depended on one source file only.
2995 In mmh, 39 tools are built from 51 source files.
2996 This is a ratio of 1.3 source files per program.
2997 18 programs depend on multiple source files;
2998 21 programs depend on one source file only.
2999 (These numbers and the ones in the following text ignore the MH library
3000 as well as shell scripts and multiple names for the same program.)
3001 .P
3002 Splitting the source code of a large program into multiple files can
3003 increase the readability of its source code.
3004 Most of the mmh tools, however, are simple and straight-forward programs.
3005 With the exception of the MIME handling tools,
3006 .Pn pick
3007 is the largest tools.
3008 It contains 1\|037 lines of source code (measured with
3009 .Pn sloccount ), excluding the MH library.
3010 Only the MIME handling tools (\c
3011 .Pn mhbuild ,
3012 .Pn mhstore ,
3013 .Pn show ,
3014 etc.)
3015 are larger.
3016 Splitting programs with less than 1\|000 lines of code into multiple
3017 source files seldom leads to better readability.
3018 For such tools, splitting makes sense
3019 when parts of the code are reused in other programs,
3020 and the reused code fragment is not general enough
3021 for including it in the MH library,
3022 or, if the code has dependencies on a library that only few programs need.
3023 .Fn uip/packsbr.c ,
3024 for instance, provides the core program logic for the
3025 .Pn packf
3026 and
3027 .Pn rcvpack
3028 programs.
3029 .Fn uip/packf.c
3030 and
3031 .Fn uip/rcvpack.c
3032 mainly wrap the core function appropriately.
3033 No other tools use the folder packing functions.
3034 As another example,
3035 .Fn uip/termsbr.c
3036 provides termcap support, which requires linking with a termcap or
3037 curses library.
3038 Including
3039 .Fn uip/termsbr.c
3040 into the MH library would require every program to be linked with
3041 termcap or curses, although only few of the programs require it.
3042 .P
3043 The task of MIME handling is complex enough that splitting its code
3044 into multiple source files improves the readability.
3045 The program
3046 .Pn mhstore ,
3047 for instance, is compiled out of seven source files with 2\|500
3048 lines of code in summary.
3049 The main code file
3050 .Fn uip/mhstore.c
3051 consists of 800 lines; the other 1\|700 lines of code are reused in
3052 other MIME handling tools.
3053 It seems to be worthwhile to bundle the generic MIME handling code into
3054 a MH-MIME library, as a companion to the MH standard library.
3055 This is left open for the future.
3056 .P
3057 The work already done, focussed on the non-MIME tools.
3058 The amount of code compiled into each program was reduced.
3059 This eases the understanding of the code base.
3060 In nmh,
3061 .Pn comp
3062 was built from six source files:
3063 .Fn comp.c ,
3064 .Fn whatnowproc.c ,
3065 .Fn whatnowsbr.c ,
3066 .Fn sendsbr.c ,
3067 .Fn annosbr.c ,
3068 and
3069 .Fn distsbr.c .
3070 In mmh, it builds from only two:
3071 .Fn comp.c
3072 and
3073 .Fn whatnowproc.c .
3074 In nmh's
3075 .Pn comp ,
3076 the core function of
3077 .Pn whatnow ,
3078 .Pn send ,
3079 and
3080 .Pn anno
3081 were compiled into
3082 .Pn comp .
3083 This saved the need to execute these programs with
3084 .Fu fork()
3085 and
3086 .Fu exec() ,
3087 two expensive system calls.
3088 Whereis this approach improved the time performance,
3089 it interweaved the source code.
3090 Core functionalities were not encapsulated into programs but into
3091 function, which were then wrapped by programs.
3092 For example,
3093 .Fn uip/annosbr.c
3094 included the function
3095 .Fu annotate() .
3096 Each program that wanted to annotate messages, included the source file
3097 .Fn uip/annosbr.c
3098 and called
3099 .Fu annotate() .
3100 Because the function
3101 .Fu annotate()
3102 was used like the tool
3103 .Pn anno ,
3104 it had seven parameters, reflecting the command line switches of the tool.
3105 When another pair of command line switches was added to
3106 .Pn anno ,
3107 a rather ugly hack was implemented to avoid adding another parameter
3108 to the function.
3109 .Ci d9b1d57351d104d7ec1a5621f090657dcce8cb7f
3110 .P
3111 Separation simplifies the understanding of program code
3112 because the area influenced by any particular statement is smaller.
3113 The separating on the program-level is more strict than the separation
3114 on the function level.
3115 In mmh, the relevant code of
3116 .Pn comp
3117 comprises the two files
3118 .Fn uip/comp.c
3119 and
3120 .Fn uip/whatnowproc.c ,
3121 together 210 lines of code.
3122 In nmh,
3123 .Pn comp
3124 comprises six files with 2\|450 lines.
3125 Not all of the code in these six files was actually used by
3126 .Pn comp ,
3127 but the code reader needed to read all of the code first to know which
3128 parts were used.
3129 .P
3130 As I have read a lot in the code base during the last two years,
3131 I learned about the easy and the difficult parts.
3132 Code is easy to understand if:
3133 .BU
3134 The influenced code area is small
3135 .BU
3136 The boundaries are strictly defined
3137 .BU
3138 The code is written straight-forward
3139 .P
3140 .\" XXX move this paragraph somewhere else?
3141 Reading
3142 .Pn rmm 's
3143 source code in
3144 .Fn uip/rmm.c
3145 is my recommendation for a beginner's entry point into the code base of nmh.
3146 The reasons are that the task of
3147 .Pn rmm
3148 is straight forward and it consists of one small source code file only,
3149 yet its source includes code constructs typical for MH tools.
3150 With the introduction of the trash folder in mmh,
3151 .Pn rmm
3152 became a bit more complex, because it invokes
3153 .Pn refile .
3154 Still, it is a good example for a simple tool with clear sources.
3155 .P
3156 Understanding
3157 .Pn comp
3158 requires to read 210 lines of code in mmh, but ten times as much in nmh.
3159 Due to the aforementioned hack in
3160 .Pn anno
3161 to save the additional parameter, information passed through the program's
3162 source base in obscure ways.
3163 Thus, understanding
3164 .Pn comp ,
3165 required understanding the inner workings of
3166 .Fn uip/annosbr.c
3167 first.
3168 To be sure to fully understand a program, its whole source code needs
3169 to be examined.
3170 Not doing so is a leap of faith, assuming that the developers
3171 have avoided obscure programming techniques.
3172 By separating the tools on the program-level, the boundaries are
3173 clearly visible and technically enforced.
3174 The interfaces are calls to
3175 .Fu exec()
3176 rather than arbitrary function calls.
3177 .P
3178 But the real problem is another:
3179 Nmh violates the golden ``one tool, one job'' rule of the Unix philosophy.
3180 Understanding
3181 .Pn comp
3182 requires understanding
3183 .Fn uip/annosbr.c
3184 and
3185 .Fn uip/sendsbr.c
3186 because
3187 .Pn comp
3188 does annotate and send messages.
3189 In nmh, there surely exists the tool
3190 .Pn send ,
3191 which does (almost) only send messages.
3192 But
3193 .Pn comp
3194 and
3195 .Pn repl
3196 and
3197 .Pn forw
3198 and
3199 .Pn dist
3200 and
3201 .Pn whatnow
3202 and
3203 .Pn viamail ,
3204 they all (!) have the same message sending function included, too.
3205 In result,
3206 .Pn comp
3207 sends messages without using
3208 .Pn send .
3209 The situation is the same as if
3210 .Pn grep
3211 would page without
3212 .Pn more
3213 just because both programs are part of the same code base.
3214 .P
3215 The clear separation on the surface \(en the toolchest approach \(en
3216 is violated on the level below.
3217 This violation is for the sake of time performance.
3218 On systems where
3219 .Fu fork()
3220 and
3221 .Fu exec()
3222 are expensive, the quicker response might be noticable.
3223 In the old times, sacrificing readability and conceptional beauty for
3224 speed might even have been a must to prevent MH from being unusably slow.
3225 Whatever the reasons had been, today they are gone.
3226 No longer should we sacrifice readability or conceptional beauty.
3227 No longer should we violate the Unix philosophy's ``one tool, one job''
3228 guideline.
3229 No longer should we keep speed improvements that became unnecessary.
3230 .P
3231 Therefore, mmh's
3232 .Pn comp
3233 does no longer send messages.
3234 In mmh, different jobs are divided among separate programs that
3235 invoke each other as needed.
3236 In consequence,
3237 .Pn comp
3238 invokes
3239 .Pn whatnow
3240 which thereafter invokes
3241 .Pn send .
3242 The clear separation on the surface is maintained on the level below.
3243 Human users and the tools use the same interface \(en
3244 annotations, for example, are made by invoking
3245 .Pn anno ,
3246 no matter if requested by programs or by human beings.
3247 The decrease of tools built from multiple source files and thus
3248 the decrease of
3249 .Fn uip/*sbr.c
3250 files confirm the improvement.
3251 .P
3252 One disadvantage needs to be taken with this change:
3253 The compiler can no longer check the integrity of the interfaces.
3254 By changing the command line interfaces of tools, it is
3255 the developer's job to adjust the invocations of these tools as well.
3256 As this is a manual task and regression tests, which could detect such
3257 problems, are not available yet, it is prone to errors.
3258 These errors will not be detected at compile time but at run time.
3259 Installing regression tests is a task left to do.
3260 In the best case, a uniform way of invoking tools from other tools
3261 can be developed to allow automated testing at compile time.
3262
3263
3264
3265
3266 .H2 "User Data Locations
3267 .P
3268 In nmh, a personal setup consists of the MH profile and the MH directory.
3269 The profile is a file named
3270 .Fn \&.mh_profile
3271 in the user's home directory.
3272 It contains the static configuration.
3273 It also contains the location of the MH directory in the profile entry
3274 .Pe Path .
3275 The MH directory contains the mail storage and is the first
3276 place to search for personal forms, scan formats, and similar
3277 configuration files.
3278 The location of the MH directory can be chosen freely by the user.
3279 The default and usual name is a directory named
3280 .Fn Mail
3281 in the home directory.
3282 .P
3283 The way MH data is splitted between profile and MH directory is a legacy.
3284 It is only sensible in a situation where the profile is the only
3285 configuration file.
3286 Why else should the mail storage and the configuration files be intermixed?
3287 They are different kinds of data:
3288 The data to be operated on and the configuration to change how
3289 tools operate.
3290 Splitting the configuration between the profile and the MH directory
3291 is bad.
3292 Merging the mail storage and the configuration in one directory is bad
3293 as well.
3294 As the mail storage and the configuration were not separated sensibly
3295 in the first place, I did it now.
3296 .P
3297 Personal mmh data is grouped by type, resulting in two distinct parts:
3298 The mail storage and the configuration.
3299 In mmh, the mail storage directory still contains all the messages,
3300 but, in exception of public sequences files, nothing else.
3301 In difference to nmh, the auxiliary configuration files are no longer
3302 located there.
3303 Therefore, the directory is no longer called the user's \fIMH directory\fP
3304 but his \fImail storage\fP.
3305 Its location is still user-chosen, with the default name
3306 .Fn Mail ,
3307 in the user's home directory.
3308 In mmh, the configuration is grouped together in
3309 the hidden directory
3310 .Fn \&.mmh
3311 in the user's home directory.
3312 This \fImmh directory\fP contains the context file, personal forms,
3313 scan formats, and the like, but also the user's profile, now named
3314 .Fn profile .
3315 The location of the profile is no longer fixed to
3316 .Fn $HOME/.mh_profile
3317 but to
3318 .Fn $HOME/.mmh/profile .
3319 Having both, the file
3320 .Fn $HOME/.mh_profile
3321 and the configuration directory
3322 .Fn $HOME/.mmh
3323 appeared to be inconsistent.
3324 The approach chosen for mmh is consistent, simple, and familiar to
3325 Unix users.
3326 .P
3327 MH allows users to have multiiple MH setups.
3328 Therefore, it is necessary to select a different profile.
3329 The profile is the single entry point to access the rest of a
3330 personal MH setup.
3331 In nmh, the environment variable
3332 .Ev MH
3333 could be used to specifiy a different profile.
3334 To operate in the same MH setup with a separate context,
3335 the
3336 .Ev MHCONTEXT
3337 environment variable could be used.
3338 This allows having own current folders and current messages in
3339 each terminal, for instance.
3340 In mmh, three environment variables are used.
3341 .Ev MMH
3342 overrides the default location of the mmh directory (\c
3343 .Fn .mmh ).
3344 .Ev MMHP
3345 and
3346 .Ev MMHC
3347 override the paths to the profile and context files, respectively.
3348 This approach allows the set of personal configuration files to be chosen
3349 independently from the profile, context, and mail storage.
3350 .P
3351 The separation of the files by type is sensible and convenient.
3352 The new approach has no functional disadvantages,
3353 as every setup I can imagine can be implemented with both approaches,
3354 possibly even easier with the new approach.
3355 The main achievement of the change is the clear and sensible split
3356 between mail storage and configuration.
3357
3358
3359
3360
3361
3362
3363 .H1 "Concept Exploitation \"Homogeneity
3364
3365
3366 .H2 "Draft Folder
3367 .P 2546 .P
3368 In the beginning, MH had the concept of a draft message. 2547 In the beginning, MH had the concept of a draft message.
3369 This is the file 2548 This is the file
3370 .Fn draft 2549 .Fn draft
3371 in the MH directory, which is treated special. 2550 in the MH directory, which is treated special.
3471 system as a whole. 2650 system as a whole.
3472 Although my part in the draft handling improvement was small, 2651 Although my part in the draft handling improvement was small,
3473 it was important. 2652 it was important.
3474 2653
3475 2654
3476 2655 .U3 "Trash Folder
3477 .H2 "Trash Folder
3478 .P 2656 .P
3479 Similar to the situation for drafts is the situation for removed messages. 2657 Similar to the situation for drafts is the situation for removed messages.
3480 Historically, a message was ``deleted'' by prepending a specific 2658 Historically, a message was ``deleted'' by prepending a specific
3481 \fIbackup prefix\fP, usually the comma character, 2659 \fIbackup prefix\fP, usually the comma character,
3482 to the file name. 2660 to the file name.
3626 By generalizing the message removal in a way that it becomes covered 2804 By generalizing the message removal in a way that it becomes covered
3627 by the MH concepts makes the whole system more powerful. 2805 by the MH concepts makes the whole system more powerful.
3628 2806
3629 2807
3630 2808
3631 .H2 "Path Notations 2809
3632 .P 2810
3633 FIXME! TODO 2811 .H1 "Styling
3634 2812 .P
3635 2813 Kernighan and Pike have emphasized the importance of style in the
3636 2814 preface of their book:
3637 .H2 "Of One Cast 2815 .[ [
3638 .P 2816 kernighan pike practice of programming
2817 .], p. x]
2818 .QS
2819 Chapter 1 discusses programming style.
2820 Good style is so important to good programming that we have chose
2821 to cover it first.
2822 .QE
2823 This section covers changes in mmh that were motivated by the desire
2824 to improve on style.
2825 Many of them follow the rules given in the quoted book.
2826 .[
2827 kernighan pike practice of programming
2828 .]
2829
2830
2831
2832
2833 .H2 "Code Style
2834 .P
2835 .U3 "Indentation Style
2836 .P
2837 Indentation styles are the holy cow of programmers.
2838 Again Kernighan and Pike:
2839 .[ [
2840 kernighan pike practice of programming
2841 .], p. 10]
2842 .QS
2843 Programmers have always argued about the layout of programs,
2844 but the specific style is much less important than its consistent
2845 application.
2846 Pick one style, preferably ours, use it consistently, and don't waste
2847 time arguing.
2848 .QE
2849 .P
2850 I agree that the constant application is most important,
2851 but I believe that some styles have advantages over others.
2852 For instance the indentation with tab characters only.
2853 Tab characters directly map to the nesting level \(en
2854 one tab, one level.
2855 Tab characters are flexible because developers can adjust them to
2856 whatever width they like to have.
2857 There is no more need to run
2858 .Pn unexpand
2859 or
2860 .Pn entab
2861 programs to ensure the correct mixture of leading tabs and spaces.
2862 The simple rules are: (1) Leading whitespace must consist of tabs only.
2863 (2) Any other whitespace should consist of spaces.
2864 These two rules ensure the integrity of the visual appearance.
2865 Although reformatting existing code should be avoided, I did it.
2866 I did not waste time arguing; I just did it.
2867 .Ci a485ed478abbd599d8c9aab48934e7a26733ecb1
2868
2869 .U3 "Comments
2870 .P
2871 Section 1.6 of
2872 .[ [
2873 kernighan pike practice of programming
2874 .], p. 23]
2875 demands: ``Don't belabor the obvious.''
2876 Hence, I simply removed all the comments in the following code excerpt:
2877 .VS
2878 context_replace(curfolder, folder); /* update current folder */
2879 seq_setcur(mp, mp->lowsel); /* update current message */
2880 seq_save(mp); /* synchronize message sequences */
2881 folder_free(mp); /* free folder/message structure */
2882 context_save(); /* save the context file */
2883
2884 [...]
2885
2886 int c; /* current character */
2887 char *cp; /* miscellaneous character pointer */
2888
2889 [...]
2890
2891 /* NUL-terminate the field */
2892 *cp = '\0';
2893 VE
2894 .Ci 426543622b377fc5d091455cba685e114b6df674
2895 .P
2896 The names of the functions explain enough already.
2897
2898 .U3 "Names
2899 .P
2900 Kernighan and Pike suggest:
2901 ``Use active names for functions''.
2902 .[ [
2903 kernighan pike practice of programming
2904 .], p. 4]
2905 One application of this rule was the rename of
2906 .Fu check_charset()
2907 to
2908 .Fu is_native_charset() .
2909 .Ci 8d77b48284c58c135a6b2787e721597346ab056d
2910 The same change fixed a violation of ``Be accurate'' as well.
2911 The code did not match the expectation the function suggested,
2912 as it, for whatever reason, only compared the first ten characters
2913 of the charset name.
2914 .P
2915 More important than using active names is using descriptive names.
2916 Renaming the obscure function
2917 .Fu m_unknown()
2918 was a delightful event.
2919 .Ci 611d68d19204d7cbf5bd585391249cb5bafca846
2920 .P
2921 Magic numbers are generally considered bad style.
2922 Obviously, Kernighan and Pike agree:
2923 ``Give names to magic numbers''.
2924 .[ [
2925 kernighan pike practice of programming
2926 .], p. 19]
2927 One such change was naming the type of input \(en mbox or mail folder \(en
2928 to be scanned:
2929 .VS
2930 #define SCN_MBOX (-1)
2931 #define SCN_FOLD 0
2932 VE
2933 .Ci 7ffb36d28e517a6f3a10272056fc127592ab1c19
2934 .P
2935 The argument
2936 .Ar outnum
2937 of the function
2938 .Fu scan()
2939 in
2940 .Fn uip/scansbr.c
2941 defines the number of the message to be created.
2942 If no message is to be created, the argument is misused to transport
2943 program logic.
2944 This lead to obscure code.
2945 I improved the clarity of the code by introducing two variables:
2946 .VS
2947 int incing = (outnum > 0);
2948 int ismbox = (outnum != 0);
2949 VE
2950 They cover the magic values and are used for conditions.
2951 The variable
2952 .Ar outnum
2953 is only used when it holds an ordinary message number.
2954 .Ci b8b075c77be7794f3ae9ff0e8cedb12b48fd139f
2955 The clarity improvement of the change showed detours in the program logic
2956 of related code parts.
2957 Having the new variables with descriptive names, a more
2958 straight forward implementation became apparent.
2959 Before the clarification was done,
2960 the possibility to improve had not be seen.
2961 .Ci aa60b0ab5e804f8befa890c0a6df0e3143ce0723
2962
2963 .U3 "Rework of \f(CWanno\fP
2964 .P
2965 At the end of their chapter on style,
2966 Kernighan and Pike ask: ``But why worry about style?''
2967 The following example of my rework of
2968 .Pn anno
2969 provides an answer why style is important in the first place.
2970 .P
2971 Until 2002,
2972 .Pn anno
2973 had six functional command line switches,
2974 .Sw -component
2975 and
2976 .Sw -text ,
2977 which took an argument each,
2978 and the two pairs of flags,
2979 .Sw -[no]date
2980 and
2981 .Sw -[no]inplace.,
2982 .Sw -component
2983 and
2984 .Sw -text ,
2985 which took an argument each,
2986 and the two pairs of flags,
2987 .Sw -[no]date
2988 and
2989 .Sw -[no]inplace .
2990 Then Jon Steinhart introduced his attachment system.
2991 In need for more advanced annotation handling, he extended
2992 .Pn anno .
2993 He added five more switches:
2994 .Sw -draft ,
2995 .Sw -list ,
2996 .Sw -delete ,
2997 .Sw -append ,
2998 and
2999 .Sw -number ,
3000 the last one taking an argument.
3001 .Ci 7480dbc14bc90f2d872d434205c0784704213252
3002 Later,
3003 .Sw -[no]preserve
3004 was added.
3005 .Ci d9b1d57351d104d7ec1a5621f090657dcce8cb7f
3006 Then, the Synopsis section of the man page
3007 .Mp anno (1)
3008 read:
3009 .VS
3010 anno [+folder] [msgs] [-component field] [-inplace | -noinplace]
3011 [-date | -nodate] [-draft] [-append] [-list] [-delete]
3012 [-number [num|all]] [-preserve | -nopreserve] [-version]
3013 [-help] [-text body]
3014 VE
3015 .LP
3016 The implementation followed the same structure.
3017 Problems became visible when
3018 .Cl "anno -list -number 42
3019 worked on the current message instead on message number 42,
3020 and
3021 .Cl "anno -list -number l:5
3022 did not work on the last five messages but failed with the mysterious
3023 error message: ``anno: missing argument to -list''.
3024 Yet, the invocation matched the specification in the man page.
3025 There, the correct use of
3026 .Sw -number
3027 was defined as being
3028 .Cl "[-number [num|all]]
3029 and the textual description for the combination with
3030 .Sw -list
3031 read:
3032 .QS
3033 The -list option produces a listing of the field bodies for
3034 header fields with names matching the specified component,
3035 one per line. The listing is numbered, starting at 1, if
3036 the -number option is also used.
3037 .QE
3038 .LP
3039 The problem was manifold.
3040 The code required a numeric argument to the
3041 .Sw -number
3042 switch.
3043 If it was missing or non-numeric,
3044 .Pn anno
3045 aborted with an error message that had an off-by-one error,
3046 printing the switch one before the failing one.
3047 Semantically, the argument to the
3048 .Sw -number
3049 switch is only necessary in combination with
3050 .Sw -delete ,
3051 but not with
3052 .Sw -list .
3053 In the former case it is even necessary.
3054 .P
3055 Trying to fix these problems on the surface would not have solved it truly.
3056 The problems discovered originate from a discrepance between the semantic
3057 structure of the problem and the structure implemented in the program.
3058 Such structural differences can not be cured on the surface.
3059 They need to be solved by adjusting the structure of the implementation
3060 to the structure of the problem.
3061 .P
3062 In 2002, the new switches
3063 .Sw -list
3064 and
3065 .Sw -delete
3066 were added in the same way, the
3067 .Sw -number
3068 switch for instance had been added.
3069 Yet, they are of structural different type.
3070 Semantically,
3071 .Sw -list
3072 and
3073 .Sw -delete
3074 introduce modes of operation.
3075 Historically,
3076 .Pn anno
3077 had only one operation mode: adding header fields.
3078 With the extension, it got two moder modes:
3079 listing and deleting header fields.
3080 The structure of the code changes did not pay respect to this
3081 fundamental change to
3082 .Pn anno 's
3083 behavior.
3084 Neither the implementation nor the documentation did clearly
3085 define them as being exclusive modes of operation.
3086 Having identified the problem, I solved it by putting structure into
3087 .Pn anno
3088 and its documentation.
3089 .Ci d54c8db8bdf01e8381890f7729bc0ef4a055ea11
3090 .P
3091 The difference is visible in both, the code and the documentation.
3092 The following code excerpt:
3093 .VS
3094 int delete = -2; /* delete header element if set */
3095 int list = 0; /* list header elements if set */
3096 [...]
3097 case DELETESW: /* delete annotations */
3098 delete = 0;
3099 continue;
3100 case LISTSW: /* produce a listing */
3101 list = 1;
3102 continue;
3103 VE
3104 .LP
3105 was replaced by:
3106 .VS
3107 static enum { MODE_ADD, MODE_DEL, MODE_LIST } mode = MODE_ADD;
3108 [...]
3109 case DELETESW: /* delete annotations */
3110 mode = MODE_DEL;
3111 continue;
3112 case LISTSW: /* produce a listing */
3113 mode = MODE_LIST;
3114 continue;
3115 VE
3116 .LP
3117 The replacement code does not only reflect the problem's structure better,
3118 it is easier to understand as well.
3119 The same applies to the documentation.
3120 The man page was completely reorganized to propagate the same structure.
3121 This is visible in the Synopsis section:
3122 .VS
3123 anno [+folder] [msgs] [-component field] [-text body]
3124 [-append] [-date | -nodate] [-preserve | -nopreserve]
3125 [-Version] [-help]
3126
3127 anno -delete [+folder] [msgs] [-component field] [-text
3128 body] [-number num | all ] [-preserve | -nopreserve]
3129 [-Version] [-help]
3130
3131 anno -list [+folder] [msgs] [-component field] [-number]
3132 [-Version] [-help]
3133 VE
3134 .\" XXX think about explaining the -preserve rework?
3135
3136
3137
3138
3139 .H2 "Standard Libraries
3140 .P
3141 MH is one decade older than the POSIX and ANSI C standards.
3142 Hence, MH included own implementations of functions
3143 that are standardized and thus widely available today,
3144 but were not back then.
3145 Today, twenty years after the POSIX and ANSI C were published,
3146 developers can expect system to comply with these standards.
3147 In consequence, MH-specific replacements for standard functions
3148 can and should be dropped.
3149 Kernighan and Pike advise: ``Use standard libraries.''
3150 .[ [
3151 kernighan pike practice of programming
3152 .], p. 196]
3153 Actually, MH had followed this advice in history,
3154 but it had not adjusted to the changes in this field.
3155 The
3156 .Fu snprintf()
3157 function, for instance, was standardized with C99 and is available
3158 almost everywhere because of its high usefulness.
3159 In project's own implementation of
3160 .Fu snprintf()
3161 was dropped in March 2012 in favor for using the one of the
3162 standard library.
3163 .Ci 0052f1024deb0a0a2fc2e5bacf93d45a5a9c9b32
3164 Such decisions limit the portability of mmh
3165 if systems don't support these standardized and widespread functions.
3166 This compromise is made because mmh focuses on the future.
3167 .P
3168 I am not yet thirty years old and my C and Unix experience comprises
3169 only half a dozen years.
3170 Hence, I need to learn about the history in retrospective.
3171 I have not used those ancient constructs myself.
3172 I have not suffered from their incompatibilities.
3173 I have not longed for standardization.
3174 All my programming experience is from a time when ANSI C and POSIX
3175 were well established already.
3176 I have only read a lot of books about the (good) old times.
3177 This puts me in a difficult positions when working with old code.
3178 I need to freshly acquire knowledge about old code constructs and ancient
3179 programming styles, whereas older programmers know these things by
3180 heart from their own experience.
3181 .P
3182 Being aware of the situation, I rather let people with more historic
3183 experience replace ancient code constructs with standardized ones.
3184 Lyndon Nerenberg covered large parts of this task for the nmh project.
3185 He converted project-specific functions to POSIX replacements,
3186 also removing the conditionals compilation of now standardized features.
3187 Ken Hornstein and David Levine had their part in the work, too.
3188 Often, I only needed to pull over changes from nmh into mmh.
3189 These changes include many commits; these are among them:
3190 .Ci 768b5edd9623b7238e12ec8dfc409b82a1ed9e2d
3191 .Ci 0052f1024deb0a0a2fc2e5bacf93d45a5a9c9b32 .
3192 .P
3193 During my own work, I tidied up the \fIMH standard library\fP,
3194 .Fn libmh.a ,
3195 which is located in the
3196 .Fn sbr
3197 (``subroutines'') directory in the source tree.
3198 The MH library includes functions that mmh tools usually need.
3199 Among them are MH-specific functions for profile, context, sequence,
3200 and folder handling, but as well
3201 MH-independent functions, such as auxiliary string functions,
3202 portability interfaces and error-checking wrappers for critical
3203 functions of the standard library.
3204 .P
3205 I have replaced the
3206 .Fu atooi()
3207 function with calls to
3208 .Fu strtoul()
3209 with the third parameter \(en the base \(en set to eight.
3210 .Fu strtoul()
3211 is part of C89 and thus considered safe to use.
3212 .Ci c490c51b3c0f8871b6953bd0c74551404f840a74
3213 .P
3214 I did remove project-included fallback implementations of
3215 .Fu memmove()
3216 and
3217 .Fu strerror() ,
3218 although Peter Maydell had re-included them into nmh in 2008
3219 to support SunOS 4.
3220 Nevertheless, these functions are part of ANSI C.
3221 Systems that do not even provide full ANSI C support should not
3222 put a load on mmh.
3223 .Ci b067ff5c465a5d243ce5a19e562085a9a1a97215
3224 .P
3225 The
3226 .Fu copy()
3227 function copies the string in argument one to the location in two.
3228 In contrast to
3229 .Fu strcpy() ,
3230 it returns a pointer to the terminating null-byte in the destination area.
3231 The code was adjusted to replace
3232 .Fu copy()
3233 with
3234 .Fu strcpy() ,
3235 except within
3236 .Fu concat() ,
3237 where
3238 .Fu copy()
3239 was more convenient.
3240 Therefore, the definition of
3241 .Fu copy()
3242 was moved into the source file of
3243 .Fu concat()
3244 and its visibility is now limited to it.
3245 .Ci 552fd7253e5ee9e554c5c7a8248a6322aa4363bb
3246 .P
3247 The function
3248 .Fu r1bindex()
3249 had been a generalized version of
3250 .Fu basename()
3251 with minor differences.
3252 As all calls to
3253 .Fu r1bindex()
3254 had the slash (`/') as delimiter anyway,
3255 replacing
3256 .Fu r1bindex()
3257 with the more specific and better-named function
3258 .Fu basename()
3259 became desirable.
3260 Unfortunately, many of the 54 calls to
3261 .Fu r1bindex()
3262 depended on a special behavior,
3263 which differed from the POSIX specification for
3264 .Fu basename() .
3265 Hence,
3266 .Fu r1bindex()
3267 was kept but renamed to
3268 .Fu mhbasename() ,
3269 fixing the delimiter to the slash.
3270 .Ci 240013872c392fe644bd4f79382d9f5314b4ea60
3271 For possible uses of
3272 .Fu r1bindex()
3273 with a different delimiter,
3274 the ANSI C function
3275 .Fu strrchr()
3276 provides the core functionality.
3277 .P
3278 The
3279 .Fu ssequal()
3280 function \(en apparently for ``substring equal'' \(en
3281 was renamed to
3282 .Fu isprefix() ,
3283 because this is what it actually checks.
3284 .Ci c20b4fa14515c7ab388ce35411d89a7a92300711
3285 Its source file had included the following comments, no joke.
3286 .VS
3287 /*
3288 * THIS CODE DOES NOT WORK AS ADVERTISED.
3289 * It is actually checking if s1 is a PREFIX of s2.
3290 * All calls to this function need to be checked to see
3291 * if that needs to be changed. Prefix checking is cheaper, so
3292 * should be kept if it's sufficient.
3293 */
3294
3295 /*
3296 * Check if s1 is a substring of s2.
3297 * If yes, then return 1, else return 0.
3298 */
3299 VE
3300 Two months later, it was completely removed by replacing it with
3301 .Fu strncmp() .
3302 .Ci b0b1dd37ff515578cf7cba51625189eb34a196cb
3303
3304
3305
3306
3307 .H2 "Modularization
3308 .P
3309 The source code of the mmh tools is located in the
3310 .Fn uip
3311 (``user interface programs'') directory.
3312 Each tools has a source file with the same name.
3313 For example,
3314 .Pn rmm
3315 is built from
3316 .Fn uip/rmm.c .
3317 Some source files are used for multiple programs.
3318 For example
3319 .Fn uip/scansbr.c
3320 is used for both,
3321 .Pn scan
3322 and
3323 .Pn inc .
3324 In nmh, 49 tools were built from 76 source files.
3325 This is a ratio of 1.6 source files per program.
3326 32 programs depended on multiple source files;
3327 17 programs depended on one source file only.
3328 In mmh, 39 tools are built from 51 source files.
3329 This is a ratio of 1.3 source files per program.
3330 18 programs depend on multiple source files;
3331 21 programs depend on one source file only.
3332 (These numbers and the ones in the following text ignore the MH library
3333 as well as shell scripts and multiple names for the same program.)
3334 .P
3335 Splitting the source code of a large program into multiple files can
3336 increase the readability of its source code.
3337 Most of the mmh tools, however, are simple and straight-forward programs.
3338 With the exception of the MIME handling tools,
3339 .Pn pick
3340 is the largest tools.
3341 It contains 1\|037 lines of source code (measured with
3342 .Pn sloccount ), excluding the MH library.
3343 Only the MIME handling tools (\c
3344 .Pn mhbuild ,
3345 .Pn mhstore ,
3346 .Pn show ,
3347 etc.)
3348 are larger.
3349 Splitting programs with less than 1\|000 lines of code into multiple
3350 source files seldom leads to better readability.
3351 For such tools, splitting makes sense
3352 when parts of the code are reused in other programs,
3353 and the reused code fragment is not general enough
3354 for including it in the MH library,
3355 or, if the code has dependencies on a library that only few programs need.
3356 .Fn uip/packsbr.c ,
3357 for instance, provides the core program logic for the
3358 .Pn packf
3359 and
3360 .Pn rcvpack
3361 programs.
3362 .Fn uip/packf.c
3363 and
3364 .Fn uip/rcvpack.c
3365 mainly wrap the core function appropriately.
3366 No other tools use the folder packing functions.
3367 As another example,
3368 .Fn uip/termsbr.c
3369 provides termcap support, which requires linking with a termcap or
3370 curses library.
3371 Including
3372 .Fn uip/termsbr.c
3373 into the MH library would require every program to be linked with
3374 termcap or curses, although only few of the programs require it.
3375 .P
3376 The task of MIME handling is complex enough that splitting its code
3377 into multiple source files improves the readability.
3378 The program
3379 .Pn mhstore ,
3380 for instance, is compiled out of seven source files with 2\|500
3381 lines of code in summary.
3382 The main code file
3383 .Fn uip/mhstore.c
3384 consists of 800 lines; the other 1\|700 lines of code are reused in
3385 other MIME handling tools.
3386 It seems to be worthwhile to bundle the generic MIME handling code into
3387 a MH-MIME library, as a companion to the MH standard library.
3388 This is left open for the future.
3389 .P
3390 The work already done, focussed on the non-MIME tools.
3391 The amount of code compiled into each program was reduced.
3392 This eases the understanding of the code base.
3393 In nmh,
3394 .Pn comp
3395 was built from six source files:
3396 .Fn comp.c ,
3397 .Fn whatnowproc.c ,
3398 .Fn whatnowsbr.c ,
3399 .Fn sendsbr.c ,
3400 .Fn annosbr.c ,
3401 and
3402 .Fn distsbr.c .
3403 In mmh, it builds from only two:
3404 .Fn comp.c
3405 and
3406 .Fn whatnowproc.c .
3407 In nmh's
3408 .Pn comp ,
3409 the core function of
3410 .Pn whatnow ,
3411 .Pn send ,
3412 and
3413 .Pn anno
3414 were compiled into
3415 .Pn comp .
3416 This saved the need to execute these programs with
3417 .Fu fork()
3418 and
3419 .Fu exec() ,
3420 two expensive system calls.
3421 Whereis this approach improved the time performance,
3422 it interweaved the source code.
3423 Core functionalities were not encapsulated into programs but into
3424 function, which were then wrapped by programs.
3425 For example,
3426 .Fn uip/annosbr.c
3427 included the function
3428 .Fu annotate() .
3429 Each program that wanted to annotate messages, included the source file
3430 .Fn uip/annosbr.c
3431 and called
3432 .Fu annotate() .
3433 Because the function
3434 .Fu annotate()
3435 was used like the tool
3436 .Pn anno ,
3437 it had seven parameters, reflecting the command line switches of the tool.
3438 When another pair of command line switches was added to
3439 .Pn anno ,
3440 a rather ugly hack was implemented to avoid adding another parameter
3441 to the function.
3442 .Ci d9b1d57351d104d7ec1a5621f090657dcce8cb7f
3443 .P
3444 Separation simplifies the understanding of program code
3445 because the area influenced by any particular statement is smaller.
3446 The separating on the program-level is more strict than the separation
3447 on the function level.
3448 In mmh, the relevant code of
3449 .Pn comp
3450 comprises the two files
3451 .Fn uip/comp.c
3452 and
3453 .Fn uip/whatnowproc.c ,
3454 together 210 lines of code.
3455 In nmh,
3456 .Pn comp
3457 comprises six files with 2\|450 lines.
3458 Not all of the code in these six files was actually used by
3459 .Pn comp ,
3460 but the code reader needed to read all of the code first to know which
3461 parts were used.
3462 .P
3463 As I have read a lot in the code base during the last two years,
3464 I learned about the easy and the difficult parts.
3465 Code is easy to understand if:
3466 .BU
3467 The influenced code area is small
3468 .BU
3469 The boundaries are strictly defined
3470 .BU
3471 The code is written straight-forward
3472 .P
3473 .\" XXX move this paragraph somewhere else?
3474 Reading
3475 .Pn rmm 's
3476 source code in
3477 .Fn uip/rmm.c
3478 is my recommendation for a beginner's entry point into the code base of nmh.
3479 The reasons are that the task of
3480 .Pn rmm
3481 is straight forward and it consists of one small source code file only,
3482 yet its source includes code constructs typical for MH tools.
3483 With the introduction of the trash folder in mmh,
3484 .Pn rmm
3485 became a bit more complex, because it invokes
3486 .Pn refile .
3487 Still, it is a good example for a simple tool with clear sources.
3488 .P
3489 Understanding
3490 .Pn comp
3491 requires to read 210 lines of code in mmh, but ten times as much in nmh.
3492 Due to the aforementioned hack in
3493 .Pn anno
3494 to save the additional parameter, information passed through the program's
3495 source base in obscure ways.
3496 Thus, understanding
3497 .Pn comp ,
3498 required understanding the inner workings of
3499 .Fn uip/annosbr.c
3500 first.
3501 To be sure to fully understand a program, its whole source code needs
3502 to be examined.
3503 Not doing so is a leap of faith, assuming that the developers
3504 have avoided obscure programming techniques.
3505 By separating the tools on the program-level, the boundaries are
3506 clearly visible and technically enforced.
3507 The interfaces are calls to
3508 .Fu exec()
3509 rather than arbitrary function calls.
3510 .P
3511 But the real problem is another:
3512 Nmh violates the golden ``one tool, one job'' rule of the Unix philosophy.
3513 Understanding
3514 .Pn comp
3515 requires understanding
3516 .Fn uip/annosbr.c
3517 and
3518 .Fn uip/sendsbr.c
3519 because
3520 .Pn comp
3521 does annotate and send messages.
3522 In nmh, there surely exists the tool
3523 .Pn send ,
3524 which does (almost) only send messages.
3525 But
3526 .Pn comp
3527 and
3528 .Pn repl
3529 and
3530 .Pn forw
3531 and
3532 .Pn dist
3533 and
3534 .Pn whatnow
3535 and
3536 .Pn viamail ,
3537 they all (!) have the same message sending function included, too.
3538 In result,
3539 .Pn comp
3540 sends messages without using
3541 .Pn send .
3542 The situation is the same as if
3543 .Pn grep
3544 would page without
3545 .Pn more
3546 just because both programs are part of the same code base.
3547 .P
3548 The clear separation on the surface \(en the toolchest approach \(en
3549 is violated on the level below.
3550 This violation is for the sake of time performance.
3551 On systems where
3552 .Fu fork()
3553 and
3554 .Fu exec()
3555 are expensive, the quicker response might be noticable.
3556 In the old times, sacrificing readability and conceptional beauty for
3557 speed might even have been a must to prevent MH from being unusably slow.
3558 Whatever the reasons had been, today they are gone.
3559 No longer should we sacrifice readability or conceptional beauty.
3560 No longer should we violate the Unix philosophy's ``one tool, one job''
3561 guideline.
3562 No longer should we keep speed improvements that became unnecessary.
3563 .P
3564 Therefore, mmh's
3565 .Pn comp
3566 does no longer send messages.
3567 In mmh, different jobs are divided among separate programs that
3568 invoke each other as needed.
3569 In consequence,
3570 .Pn comp
3571 invokes
3572 .Pn whatnow
3573 which thereafter invokes
3574 .Pn send .
3575 The clear separation on the surface is maintained on the level below.
3576 Human users and the tools use the same interface \(en
3577 annotations, for example, are made by invoking
3578 .Pn anno ,
3579 no matter if requested by programs or by human beings.
3580 The decrease of tools built from multiple source files and thus
3581 the decrease of
3582 .Fn uip/*sbr.c
3583 files confirm the improvement.
3584 .P
3585 One disadvantage needs to be taken with this change:
3586 The compiler can no longer check the integrity of the interfaces.
3587 By changing the command line interfaces of tools, it is
3588 the developer's job to adjust the invocations of these tools as well.
3589 As this is a manual task and regression tests, which could detect such
3590 problems, are not available yet, it is prone to errors.
3591 These errors will not be detected at compile time but at run time.
3592 Installing regression tests is a task left to do.
3593 In the best case, a uniform way of invoking tools from other tools
3594 can be developed to allow automated testing at compile time.
3595
3596
3597
3598
3599 .H2 "User Data Locations
3600 .P
3601 In nmh, a personal setup consists of the MH profile and the MH directory.
3602 The profile is a file named
3603 .Fn \&.mh_profile
3604 in the user's home directory.
3605 It contains the static configuration.
3606 It also contains the location of the MH directory in the profile entry
3607 .Pe Path .
3608 The MH directory contains the mail storage and is the first
3609 place to search for personal forms, scan formats, and similar
3610 configuration files.
3611 The location of the MH directory can be chosen freely by the user.
3612 The default and usual name is a directory named
3613 .Fn Mail
3614 in the home directory.
3615 .P
3616 The way MH data is splitted between profile and MH directory is a legacy.
3617 It is only sensible in a situation where the profile is the only
3618 configuration file.
3619 Why else should the mail storage and the configuration files be intermixed?
3620 They are different kinds of data:
3621 The data to be operated on and the configuration to change how
3622 tools operate.
3623 Splitting the configuration between the profile and the MH directory
3624 is bad.
3625 Merging the mail storage and the configuration in one directory is bad
3626 as well.
3627 As the mail storage and the configuration were not separated sensibly
3628 in the first place, I did it now.
3629 .P
3630 Personal mmh data is grouped by type, resulting in two distinct parts:
3631 The mail storage and the configuration.
3632 In mmh, the mail storage directory still contains all the messages,
3633 but, in exception of public sequences files, nothing else.
3634 In difference to nmh, the auxiliary configuration files are no longer
3635 located there.
3636 Therefore, the directory is no longer called the user's \fIMH directory\fP
3637 but his \fImail storage\fP.
3638 Its location is still user-chosen, with the default name
3639 .Fn Mail ,
3640 in the user's home directory.
3641 In mmh, the configuration is grouped together in
3642 the hidden directory
3643 .Fn \&.mmh
3644 in the user's home directory.
3645 This \fImmh directory\fP contains the context file, personal forms,
3646 scan formats, and the like, but also the user's profile, now named
3647 .Fn profile .
3648 The location of the profile is no longer fixed to
3649 .Fn $HOME/.mh_profile
3650 but to
3651 .Fn $HOME/.mmh/profile .
3652 Having both, the file
3653 .Fn $HOME/.mh_profile
3654 and the configuration directory
3655 .Fn $HOME/.mmh
3656 appeared to be inconsistent.
3657 The approach chosen for mmh is consistent, simple, and familiar to
3658 Unix users.
3659 .P
3660 MH allows users to have multiiple MH setups.
3661 Therefore, it is necessary to select a different profile.
3662 The profile is the single entry point to access the rest of a
3663 personal MH setup.
3664 In nmh, the environment variable
3665 .Ev MH
3666 could be used to specifiy a different profile.
3667 To operate in the same MH setup with a separate context,
3668 the
3669 .Ev MHCONTEXT
3670 environment variable could be used.
3671 This allows having own current folders and current messages in
3672 each terminal, for instance.
3673 In mmh, three environment variables are used.
3674 .Ev MMH
3675 overrides the default location of the mmh directory (\c
3676 .Fn .mmh ).
3677 .Ev MMHP
3678 and
3679 .Ev MMHC
3680 override the paths to the profile and context files, respectively.
3681 This approach allows the set of personal configuration files to be chosen
3682 independently from the profile, context, and mail storage.
3683 .P
3684 The separation of the files by type is sensible and convenient.
3685 The new approach has no functional disadvantages,
3686 as every setup I can imagine can be implemented with both approaches,
3687 possibly even easier with the new approach.
3688 The main achievement of the change is the clear and sensible split
3689 between mail storage and configuration.
3690
3691
3692
3693 .H2 "Path Conversion
3694 .P
3695 FIXME! XXX
3696
3697
3698 commit d39e2c447b0d163a5a63f480b23d06edb7a73aa0
3699 Author: markus schnalke <meillo@marmaro.de>
3700 Date: Fri Dec 9 16:34:57 2011 +0100
3701
3702 Completely reworked the path convertion functions
3703 Moved everything (from sbr/getfolder.c and sbr/m_maildir.c) into
3704 sbr/path.c, but actually replaced the code almost completely.
3705 See h/prototypes.h for the function changes.
3706 sbr/path.c provides explaining comments on the functions.
3707 None of them allocates memory automatically.
3708
3709 Additionally:
3710 - Like for other ``files'', `inc -audit file' places file relative
3711 to the cwd, not relative to the mh-dir. This is for consistency.
3712 - Replaced add(foo, NULL) with getcpy(foo), which ist clearer.