docs/cut

diff cut.en.ms @ 28:0d7329867dd1

Applied most of the corrections by Kate Again, they are greatly valuable!
author markus schnalke <meillo@marmaro.de>
date Sun, 16 Aug 2015 23:03:04 +0200
parents 5cefcfc72d42
children c0b522e689bc
line diff
     1.1 --- a/cut.en.ms	Tue Aug 04 21:04:10 2015 +0200
     1.2 +++ b/cut.en.ms	Sun Aug 16 23:03:04 2015 +0200
     1.3 @@ -17,20 +17,20 @@
     1.4  .LP
     1.5  Cut is a classic program in the Unix toolchest.
     1.6  It is present in most tutorials on shell programming, because it
     1.7 -is such a nice and useful tool which good explanationary value.
     1.8 -This text shall take a look behind its surface.
     1.9 +is such a nice and useful tool with good explanatory value.
    1.10 +This text shall take a look underneath its surface.
    1.11  .SH
    1.12  Usage
    1.13  .LP
    1.14 -Initially, cut had two operation modes, which were amended by a
    1.15 -third one, later. Cut may cut specified characters out of the
    1.16 -input lines or it may cut out specified fields, which are defined
    1.17 -by a delimiting character.
    1.18 +Initially, cut had two operation modes, which were later amended
    1.19 +by a third: The cut program may cut specified characters or bytes
    1.20 +out of the input lines or it may cut out specified fields, which
    1.21 +are defined by a delimiting character.
    1.22  .PP
    1.23  The character mode is well suited to slice fixed-width input
    1.24  formats into parts. One might, for instance, extract the access
    1.25 -rights from the output of \f(CWls -l\fP, here the rights of the
    1.26 -file's owner:
    1.27 +rights from the output of \f(CWls -l\fP, as shown here with the
    1.28 +rights of a file's owner:
    1.29  .CS
    1.30  	$ ls -l foo
    1.31  	-rw-rw-r-- 1 meillo users 0 May 12 07:32 foo
    1.32 @@ -39,7 +39,7 @@
    1.33  	rw-
    1.34  .CE
    1.35  .LP
    1.36 -Or the write permission for the owner, the group and the
    1.37 +Or the write permission for the owner, the group, and the
    1.38  world:
    1.39  .CS
    1.40  	$ ls -l foo | cut -c 3,6,9
    1.41 @@ -56,41 +56,42 @@
    1.42  .LP
    1.43  This command outputs no more than the first 10 characters of
    1.44  \f(CW$long\fP. (Alternatively, on could use \f(CWprintf
    1.45 -"%.10s\\n" "$long"\fP for this job.)
    1.46 +"%.10s\\n" "$long"\fP for this task.)
    1.47  .PP
    1.48 -However, if it's not about displaying characters but about their
    1.49 -storing, then \f(CW-c\fP is only partly suited. In former times,
    1.50 -when US-ASCII had been the omnipresent character encoding, each
    1.51 -character was stored with exactly one byte. Therefore, \f(CWcut
    1.52 --c\fP selected both, output characters and bytes, equally. With
    1.53 +However, if it's not about displaying characters, but rather about
    1.54 +storing them, then \f(CW-c\fP is only partly suited. In former times,
    1.55 +when US-ASCII was the omnipresent character encoding, each
    1.56 +character was stored as exactly one byte. Therefore, \f(CWcut
    1.57 +-c\fP selected both output characters and bytes equally. With
    1.58  the uprise of multi-byte encodings (like UTF-8), this assumption
    1.59  became obsolete. Consequently, a byte mode (option \f(CW-b\fP)
    1.60 -was added to cut, with POSIX.2-1992. To select the first up to
    1.61 -500 bytes of each line (and ignore the rest), one can use:
    1.62 +was added to cut, with POSIX.2-1992. To select up to 500 bytes
    1.63 +from the beginning of each line (and ignore the rest), one can use:
    1.64  .CS
    1.65  	$ cut -b -500
    1.66  .CE
    1.67  .LP
    1.68  The remainder can be caught with \f(CWcut -b 501-\fP. This
    1.69 -possibility is important for POSIX, because it allows to create
    1.70 -text files with limited line length
    1.71 +function of cut is important for POSIX, because it provides a
    1.72 +transformation of text files with arbitrary line lenghts to text
    1.73 +files with limited line length
    1.74  .[[ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cut.html#tag_20_28_17 .
    1.75  .PP
    1.76 -Although the byte mode was newly introduced, it was meant to
    1.77 -behave exactly as the old character mode. The character mode,
    1.78 -however, had to be implemented differently. In consequence,
    1.79 -the problem wasn't to support the byte mode, but to support the
    1.80 -new character mode correctly.
    1.81 +The introduction of the new byte mode essentially held the same
    1.82 +functionality as the old character mode. The character mode,
    1.83 +however, required a new, different implementation. In consequence,
    1.84 +the problem was not the support of the byte mode, but rather the
    1.85 +correct support of the new character mode.
    1.86  .PP
    1.87 -Besides the character and byte modes, cut has the field mode,
    1.88 -which is activated by \f(CW-f\fP. It selects fields from the
    1.89 -input. The delimiting character (by default, the tab) may be
    1.90 -changed using \f(CW-d\fP. It applies to the input as well as to
    1.91 -the output.
    1.92 +Besides the character and byte modes, cut also offers a field
    1.93 +mode, which is activated by \f(CW-f\fP. It selects fields from
    1.94 +the input. The field-delimiter character for the input as well
    1.95 +as for the output (by default the tab) may be changed using
    1.96 +\f(CW-d\fP.
    1.97  .PP
    1.98  The typical example for the use of cut's field mode is the
    1.99  selection of information from the passwd file. Here, for
   1.100 -instance, the username and its uid:
   1.101 +instance, the usernames and their uids:
   1.102  .CS
   1.103  	$ cut -d: -f1,3 /etc/passwd
   1.104  	root:0
   1.105 @@ -104,10 +105,10 @@
   1.106  to them or separated by whitespace.)
   1.107  .PP
   1.108  The field mode is suited for simple tabulary data, like the
   1.109 -passwd file. Beyond that, it soon reaches its limits. Especially,
   1.110 -the typical case of whitespace-separated fields is covered poorly
   1.111 -by it. Cut's delimiter is exactly one character,
   1.112 -therefore one may not split at both, space and tab characters.
   1.113 +passwd file. Beyond that, it soon reaches its limits. The typical
   1.114 +case of whitespace-separated fields, in particular, is covered
   1.115 +poorly by it. Cut's delimiter is exactly one character,
   1.116 +therefore one may not split at both space and tab characters.
   1.117  Furthermore, multiple adjacent delimiter characters lead to
   1.118  empty fields. This is not the expected behavior for
   1.119  the processing of whitespace-separated fields. Some
   1.120 @@ -115,19 +116,19 @@
   1.121  handle this case in the expected way. Apart from that, i.e.
   1.122  if one likes to stay portable, awk comes to rescue.
   1.123  .PP
   1.124 -Awk provides another function that cut misses: Changing the order
   1.125 +Awk provides another functionality that cut lacks: Changing the order
   1.126  of the fields in the output. For cut, the order of the field
   1.127  selection specification is irrelevant; it doesn't even matter if
   1.128 -fields are given multiple times. Thus, the invocation
   1.129 +fields occur multiple times. Thus, the invocation
   1.130  \f(CWcut -c 5-8,1,4-6\fP outputs the characters number
   1.131 -1, 4, 5, 6, 7 and 8 in exactly this order. The
   1.132 -selection is like in the mathematical set theory: Each
   1.133 +1, 4, 5, 6, 7, and 8 in exactly this order. The
   1.134 +selection specification resembles mathematical set theory: Each
   1.135  specified field is part of the solution set. The fields in the
   1.136  solution set are always in the same order as in the input. To
   1.137  speak with the words of the man page in Version 8 Unix:
   1.138  ``In data base parlance, it projects a relation.''
   1.139  .[[ http://man.cat-v.org/unix_8th/1/cut
   1.140 -This means, cut applies the database operation \fIprojection\fP
   1.141 +This means that cut applies the \fIprojection\fP database operation
   1.142  to the text input. Wikipedia explains it in the following way:
   1.143  ``In practical terms, it can be roughly thought of as picking a
   1.144  sub-set of all available columns.''
   1.145 @@ -140,23 +141,23 @@
   1.146  III. Browsing through the sources of System III, one finds cut.c
   1.147  with the timestamp 1980-04-11
   1.148  .[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=SysIII/usr/src/cmd .
   1.149 -This is the oldest implementation of the program, I was able to
   1.150 -discover. However, the SCCS-ID in the source code speaks of
   1.151 -version 1.5. According to Doug McIlroy
   1.152 +This is the oldest implementation of the program I was able to
   1.153 +discover. However, the SCCS-ID in the source code contains the
   1.154 +version number 1.5. According to Doug McIlroy
   1.155  .[[ http://minnie.tuhs.org/pipermail/tuhs/2015-May/004083.html ,
   1.156 -the earlier history likely lays in PWB/UNIX, which was the
   1.157 +the earlier history likely lies in PWB/UNIX, which was the
   1.158  basis for System III. In the available sources of PWB 1.0 (1977)
   1.159  .[[ http://minnie.tuhs.org/Archive/PDP-11/Distributions/usdl/ ,
   1.160  no cut is present. Of PWB 2.0, no sources or useful documentation
   1.161  seem to be available. PWB 3.0 was later renamed to System III
   1.162 -for marketing purposes, hence it is identical to it. A side line
   1.163 -of PWB was CB UNIX, which was only used in the Bell Labs
   1.164 +for marketing purposes only; it is otherwise identical to it. A
   1.165 +branch of PWB was CB UNIX, which was only used in the Bell Labs
   1.166  internally. The manual of CB UNIX Edition 2.1 of November 1979
   1.167 -contains the earliest mentioning of cut, that my research brought
   1.168 -to light: A man page for it
   1.169 +contains the earliest mention of cut that my research brought
   1.170 +to light, in the form of a man page
   1.171  .[[ ftp://sunsite.icm.edu.pl/pub/unix/UnixArchive/PDP-11/Distributions/other/CB_Unix/cbunix_man1_02.pdf .
   1.172  .PP
   1.173 -Now a look on BSD: There, my earliest discovery is a cut.c with
   1.174 +A look at BSD: There, my earliest discovery is a cut.c with
   1.175  the file modification date of 1986-11-07
   1.176  .[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-UWisc/src/usr.bin/cut
   1.177  as part of the special version 4.3BSD-UWisc
   1.178 @@ -164,7 +165,7 @@
   1.179  which was released in January 1987.
   1.180  This implementation is mostly identical to the one in System
   1.181  III. The better known 4.3BSD-Tahoe (1988) does not contain cut.
   1.182 -The following 4.3BSD-Reno (1990) does include cut. It is a freshly
   1.183 +The subsequent 4.3BSD-Reno (1990) does include cut. It is a freshly
   1.184  written one by Adam S. Moskowitz and Marciano Pitargue, which was
   1.185  included in BSD in 1989
   1.186  .[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-Reno/src/usr.bin/cut .
   1.187 @@ -204,93 +205,97 @@
   1.188  	Copyright (C) 1984 David M. Ihnat
   1.189  .CE
   1.190  .LP
   1.191 -The code does have pretty old origins. Further comments show that
   1.192 +The code does have old origins. Further comments show that
   1.193  the source code was reworked by David MacKenzie first and later
   1.194  by Jim Meyering, who put it into the version control system in
   1.195 -1992. It is unclear, why the years until 1997, at least from
   1.196 -1992 on, don't show up in the copyright notice.
   1.197 +1992. It is unclear why the years until 1997, at least from
   1.198 +1992 onwards, don't show up in the copyright notice.
   1.199  .PP
   1.200  Despite all those year numbers from the 80s, cut is a rather
   1.201  young tool, at least in relation to the early Unix. Despite
   1.202 -being a decade older than Linux, the kernel, Unix had been
   1.203 -present for over ten years until cut appeared for the first
   1.204 +being a decade older than Linux (the kernel), Unix was present
   1.205 +for over ten years by the time cut appeared for the first
   1.206  time. Most notably, cut wasn't part of Version 7 Unix, which
   1.207  became the basis for all modern Unix systems. The more complex
   1.208 -tools sed and awk had been part of it already. Hence, the
   1.209 -question comes to mind, why cut was written at all, as there
   1.210 -existed two programs that were able to cover the use cases of
   1.211 -cut. On reason for cut surely was its compactness and the
   1.212 -resulting speed, in comparison to the then bulky awk. This lean
   1.213 +tools sed and awk were part of it already. Hence, the
   1.214 +question comes to mind why cut was written at all, as two
   1.215 +programs already existed that were able to cover the use cases of
   1.216 +cut. One reason for cut surely was its compactness and the
   1.217 +resulting speed, in comparison to the then-bulky awk. This lean
   1.218  shape goes well with the Unix philosopy: Do one job and do it
   1.219 -well! Cut convinced. It found it's way to other Unix variants,
   1.220 -it became standardized and today it is present everywhere.
   1.221 +well! Cut was sufficiently convincing. It found its way to
   1.222 +other Unix variants, it became standardized, and today it is
   1.223 +present everywhere.
   1.224  .PP
   1.225 -The original variant (without \f(CW-b\fP) was described by the
   1.226 -System V Interface Defintion, an important formal description
   1.227 -of UNIX System V, already in 1985. In the following years, it
   1.228 -appeared in all relevant standards. POSIX.2 in 1992 specified
   1.229 -cut for the first time in its modern form (with \f(CW-b\fP).
   1.230 +The original variant (without \f(CW-b\fP) was described already
   1.231 +in 1985, by the System V Interface Definition, an important
   1.232 +formal description of UNIX System V. In the following years, it
   1.233 +appeared in all relevant standards. POSIX.2 specified cut for
   1.234 +the first time in its modern form (with \f(CW-b\fP) in 1992.
   1.235  
   1.236  .SH
   1.237  Multi-byte support
   1.238  .LP
   1.239 -The byte mode and thus the multi-byte support of
   1.240 -the POSIX character mode are standardized since 1992. But
   1.241 +The byte mode and thus the multi-byte support of the POSIX
   1.242 +character mode have benn standardized since 1992. But
   1.243  how about their presence in the available implementations?
   1.244 -Which versions do implement POSIX correctly?
   1.245 +Which versions implement POSIX correctly?
   1.246  .PP
   1.247 -The situation is divided in three parts: There are historic
   1.248 +The situation is divided into three parts: There are historic
   1.249  implementations, which have only \f(CW-c\fP and \f(CW-f\fP.
   1.250 -Then there are implementations, which have \f(CW-b\fP but
   1.251 +Then there are implementations that have \f(CW-b\fP, but
   1.252  treat it as an alias for \f(CW-c\fP only. These
   1.253  implementations work correctly for single-byte encodings
   1.254  (e.g. US-ASCII, Latin1) but for multi-byte encodings (e.g.
   1.255  UTF-8) their \f(CW-c\fP behaves like \f(CW-b\fP (and
   1.256  \f(CW-n\fP is ignored). Finally, there are implementations
   1.257 -that implement \f(CW-b\fP and \f(CW-c\fP POSIX-compliant.
   1.258 +that implement \f(CW-c\fP and \f(CW-b\fP in a POSIX-compliant
   1.259 +way.
   1.260  .PP
   1.261  Historic two-mode implementations are the ones of 
   1.262 -System III, System V and the BSD ones until the mid-90s.
   1.263 +System III, System V, and the BSD ones until the mid-90s.
   1.264  .PP
   1.265 -Pseudo multi-byte implementations are provided by GNU and
   1.266 -modern NetBSD and OpenBSD. The level of POSIX compliance
   1.267 +Pseudo multi-byte implementations are provided by GNU,
   1.268 +modern NetBSD, and modern OpenBSD. The level of POSIX compliance
   1.269  that is presented there is often higher than the level of
   1.270  compliance that is actually provided. Sometimes it takes a
   1.271  close look to discover that \f(CW-c\fP and \f(CW-n\fP don't
   1.272  behave as expected. Some of the implementations take the
   1.273  easy way by simply being ignorant to any multi-byte
   1.274 -encodings, at least they tell that clearly:
   1.275 +encodings, at least they declare that clearly:
   1.276  .QP
   1.277 -Since we don't support multi-byte characters, the \f(CW-c\fP and \f(CW-b\fP
   1.278 -options are equivalent, and the \f(CW-n\fP option is meaningless.
   1.279 +Since we don't support multi-byte characters, the \f(CW-c\fP
   1.280 +and \f(CW-b\fP options are equivalent, and the \f(CW-n\fP
   1.281 +option is meaningless.
   1.282  .[[ http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/cut/cut.c?rev=1.18&content-type=text/x-cvsweb-markup
   1.283  .LP
   1.284  Standard-adhering implementations, ones that treat
   1.285  multi-byte characters correctly, are the one of the modern
   1.286  FreeBSD and the one in the Heirloom toolchest. Tim Robbins
   1.287  reimplemented the character mode of FreeBSD cut,
   1.288 -conforming to POSIX, in summer 2004
   1.289 +conforming to POSIX, in the summer of 2004
   1.290  .[[ https://svnweb.freebsd.org/base?view=revision&revision=131194 .
   1.291 -The question, why the other BSD systems have not
   1.292 -integrated this change, is an open one. Maybe the answer an be
   1.293 +The question why the other BSD systems have not
   1.294 +integrated this change is an open one. Maybe the answer an be
   1.295  found in the above quoted statement.
   1.296  .PP
   1.297 -How does a user find out if the cut on the own system handles
   1.298 -multi-byte characters correclty? First, one needs to check if
   1.299 +How does a user find out if the cut on their own system handles
   1.300 +multi-byte characters correctly? First, one needs to check if
   1.301  the system itself uses multi-byte characters, because otherwise
   1.302  characters and bytes are equivalent and the question
   1.303  is irrelevant. One can check this by looking at the locale
   1.304  settings, but it is easier to print a typical multi-byte
   1.305  character, for instance an Umlaut or the Euro currency
   1.306 -symbol, and check if one or more bytes are output:
   1.307 +symbol, and check if one or more bytes are generated as
   1.308 +output:
   1.309  .CS
   1.310  	$ echo ä | od -c
   1.311  	0000000 303 244  \\n
   1.312  	0000003
   1.313  .CE
   1.314  .LP
   1.315 -In this case it were two bytes: octal 303 and 244. (The
   1.316 -Newline character is added by echo.)
   1.317 +In this case it resulted in two bytes: octal 303 and 244. (The
   1.318 +newline character is added by echo.)
   1.319  .PP
   1.320  The program iconv converts text to specific encodings. This
   1.321  is the output for Latin1 and UTF-8, for comparison:
   1.322 @@ -307,8 +312,8 @@
   1.323  The output (without the iconv conversion) on many European
   1.324  systems equals one of these two.
   1.325  .PP
   1.326 -Now the test of the cut implementation. On a UTF-8 system, a
   1.327 -POSIX compliant implementation behaves as such:
   1.328 +Now for the test of the cut implementation. On a UTF-8 system, a
   1.329 +POSIX-compliant implementation behaves as such:
   1.330  .CS
   1.331  	$ echo ä | cut -c 1 | od -c
   1.332  	0000000 303 244  \\n
   1.333 @@ -323,9 +328,9 @@
   1.334  	0000001
   1.335  .CE
   1.336  .LP
   1.337 -A pseudo POSIX implementation, in contrast, behaves like the
   1.338 -middle one, for all three invocations: Only the first byte is
   1.339 -output.
   1.340 +A pseudo-POSIX implementation, in contrast, behaves like the
   1.341 +middle one for all three invocations: Only the first byte is
   1.342 +printed as output.
   1.343  
   1.344  .SH
   1.345  Implementations
   1.346 @@ -334,10 +339,10 @@
   1.347  implementations.
   1.348  .PP
   1.349  A comparison of the amount of source code is good to get a first
   1.350 -impression.  Typically, it grows through time. This can be seen
   1.351 -here, in general but not in all cases. A POSIX-compliant
   1.352 +impression. Typically, it grows through time. This can generally
   1.353 +be seen here, but not in all cases. A POSIX-compliant
   1.354  implementation of the character mode requires more code, thus
   1.355 -these implementations are rather the larger ones.
   1.356 +these implementations tend to be the larger ones.
   1.357  .TS
   1.358  center;
   1.359  r r r l l l.
   1.360 @@ -357,30 +362,30 @@
   1.361  .TE
   1.362  .LP
   1.363  Roughly four groups can be seen: (1) The two original
   1.364 -implementaions, which are mostly identical, with about 100
   1.365 +implementations, which are mostly identical, with about 100
   1.366  SLOC. (2) The five BSD versions, with about 200 SLOC. (3) The
   1.367  two POSIX-compliant versions and the old GNU one, with a SLOC
   1.368 -count in the 300s. And finally (4) the modern GNU cut with
   1.369 +count in the 300s. And finally, (4) the modern GNU cut with
   1.370  almost 600 SLOC.
   1.371  .PP
   1.372  The variation between the number of logical code
   1.373 -lines (SLOC, meassured with SLOCcount) and the number of
   1.374 -Newlines in the file (\f(CWwc -l\fP) spans between factor
   1.375 +lines (SLOC, measured with SLOCcount) and the number of
   1.376 +newlines in the file (\f(CWwc -l\fP) spans between factor
   1.377  1.06 for the oldest versions and factor 1.5 for GNU. The
   1.378 -largest influence on it are empty lines, pure comment lines
   1.379 +largest influence on it are empty lines, pure comment lines,
   1.380  and the size of the license block at the beginning of the file.
   1.381  .PP
   1.382  Regarding the variation between logical code lines and the
   1.383  file size (\f(CWwc -c\fP), the implementations span between
   1.384  25 and 30 bytes per statement. With only 21 bytes per
   1.385  statement, the Heirloom implementation marks the lower end;
   1.386 -the GNU implementation sets the upper limit at nearly 40. In
   1.387 +the GNU implementation sets the upper limit at nearly 40 bytes. In
   1.388  the case of GNU, the reason is mainly their coding style, with
   1.389 -special indent rules and long identifiers. Whether one finds
   1.390 +special indentation rules and long identifiers. Whether one finds
   1.391  the Heirloom implementation
   1.392  .[[ http://heirloom.cvs.sourceforge.net/viewvc/heirloom/heirloom/cut/cut.c?revision=1.6&view=markup
   1.393 -highly cryptic or exceptionally elegant, shall be left
   1.394 -open to the judgement of the reader. Especially the
   1.395 +highly cryptic or exceptionally elegant shall be left
   1.396 +to the judgement of the reader. Especially the
   1.397  comparison to the GNU implementation
   1.398  .[[ http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/cut.c;hb=e981643
   1.399  is impressive.
   1.400 @@ -388,12 +393,12 @@
   1.401  The internal structure of the source code (in all cases it is
   1.402  written in C) is mainly similar. Besides the mandatory main
   1.403  function, which does the command line argument processing,
   1.404 -there usually exists a function to convert the field
   1.405 +there usually is a function to convert the field
   1.406  selection specification to an internal data structure.
   1.407 -Further more, almost all implementations have separate
   1.408 +Furthermore, almost all implementations have separate
   1.409  functions for each of their operation modes. The POSIX-compliant
   1.410  versions treat the \f(CW-b -n\fP combination as a separate
   1.411 -mode and thus implement it in an own function. Only the early
   1.412 +mode and thus implement it in a separate function. Only the early
   1.413  System III implementation (and its 4.3BSD-UWisc variant) do
   1.414  everything, apart from error handling, in the main function.
   1.415  .PP
   1.416 @@ -402,12 +407,12 @@
   1.417  the other being the maximum line length. On System III, both
   1.418  numbers are limited to 512. 4.3BSD-Reno and the BSDs of the
   1.419  90s have fixed limits as well (\f(CW_BSD_LINE_MAX\fP or
   1.420 -\f(CW_POSIX2_LINE_MAX\fP). Modern FreeBSD, NetBSD, all GNU
   1.421 -implementations and the Heirloom cut is able to handle
   1.422 +\f(CW_POSIX2_LINE_MAX\fP). Modern FreeBSD, modern NetBSD, all GNU
   1.423 +implementations, and the Heirloom cut are able to handle
   1.424  arbitrary numbers of fields and line lengths \(en the memory
   1.425  is allocated dynamically. OpenBSD cut is a hybrid: It has a fixed
   1.426  maximum number of fields, but allows arbitrary line lengths.
   1.427 -The limited number of fields does, however, not appear to be
   1.428 +The limited number of fields does not, however, appear to be
   1.429  any practical problem, because \f(CW_POSIX2_LINE_MAX\fP is
   1.430  guaranteed to be at least 2048 and is thus probably large enough.
   1.431  
   1.432 @@ -417,8 +422,7 @@
   1.433  Interesting, as well, is a comparison of the short descriptions
   1.434  of cut, as can be found in the headlines of the man
   1.435  pages or at the beginning of the source code files.
   1.436 -The following list is roughly sorted by time and grouped by
   1.437 -decent:
   1.438 +The following list is roughly grouped by origin:
   1.439  .TS
   1.440  center;
   1.441  l l.
   1.442 @@ -466,27 +470,27 @@
   1.443  .PP
   1.444  It is noteworthy that the GNU coreutils in all versions
   1.445  describe the performed action as a removal of parts of the
   1.446 -input, although the user clearly selects the parts that are
   1.447 -output. Probably the words ``cut out'' are too misleading.
   1.448 -HP-UX concretized them.
   1.449 +input, although the user clearly selects the parts that then
   1.450 +consistute the output. Probably the words ``cut out'' are too
   1.451 +misleading. HP-UX tried to be more clear.
   1.452  .PP
   1.453 -There are also different terms used for the thing being
   1.454 +Different terms are also used for the part being
   1.455  selected. Some talk about fields (POSIX), some talk
   1.456  about portions (BSD) and some call it columns (Research
   1.457  Unix).
   1.458  .PP
   1.459  The seemingly least adequate description, the one of Version
   1.460  8 Unix (``rearrange columns of data'') is explainable in so
   1.461 -far that the man page covers both, cut and paste, and in
   1.462 +far that the man page covers both cut and paste, and in
   1.463  their combination, columns can be rearranged. The use of
   1.464  ``data'' instead of ``text'' might be a lapse, which McIlroy
   1.465 -corrected in his Unix Reader ... but, on the other hand, on
   1.466 +corrected in his Unix Reader ... but on the other hand, on
   1.467  Unix, the two words are mostly synonymous, because all data
   1.468  is text.
   1.469  
   1.470  
   1.471  .SH
   1.472 -Referenzen
   1.473 +References
   1.474  .LP
   1.475  .nf
   1.476  ._r