docs/cut: cut.en.ms comparison

comparison cut.en.ms @ 31:106609b64dc4

minor corrections and improvements in the text

author	markus schnalke <meillo@marmaro.de>
date	Tue, 15 Sep 2015 17:20:20 +0200
parents	6977e2ee5dc5
children	5f78bcd34eeb

comparison

equal deleted inserted replaced

-:6977e2ee5dc5
+:106609b64dc4
 the input. The field-delimiter character for the input as well
 as for the output (by default the tab) may be changed using
 \f(CW-d\fP.
 .PP
 The typical example for the use of cut's field mode is the
-selection of information from the passwd file. Here, for
+selection of information from the password file. Here, for
 instance, the usernames and their uids:
 .CS
 	$ cut -d: -f1,3 /etc/passwd
 	root:0
 	bin:1
 .LP
 (The values to the command line switches may be appended directly
 to them or separated by whitespace.)
 .PP
 The field mode is suited for simple tabulary data, like the
-passwd file. Beyond that, it soon reaches its limits. The typical
+password file. Beyond that, it soon reaches its limits. The typical
 case of whitespace-separated fields, in particular, is covered
 poorly by it. Cut's delimiter is exactly one character,
 therefore one can not split at both space and tab characters.
 Furthermore, multiple adjacent delimiter characters lead to
 empty fields. This is not the expected behavior for
 .CS
 	Copyright (C) 1997-2015 Free Software Foundation, Inc.
 	Copyright (C) 1984 David M. Ihnat
 .CE
 .LP
-The code does have old origins. Further comments show that
+This code does have old origins. Further comments show that
 the source code was reworked by David MacKenzie first and later
 by Jim Meyering, who put it into the version control system in
 1992. It is unclear why the years until 1997, at least from
 1992 onwards, don't show up in the copyright notice.
 .PP
 Despite all those year numbers from the 80s, cut is a rather
 young tool, at least in relation to the early Unix. Despite
 being a decade older than Linux (the kernel), Unix was present
-for over ten years by the time cut appeared for the first
+for over ten years already by the time cut appeared for the first
 time. Most notably, cut wasn't part of Version 7 Unix, which
 became the basis for all modern Unix systems. The more complex
 tools sed and awk were part of it already. Hence, the
 question comes to mind why cut was written at all, as two
-programs already existed that were able to cover the use cases of
+programs already existed that were able to cover its use
-cut. One reason for cut surely was its compactness and the
+cases. One reason for cut surely was its compactness and the
 resulting speed, in comparison to the then-bulky awk. This lean
 shape goes well with the Unix philosopy: Do one job and do it
 well! Cut was sufficiently convincing. It found its way to
 other Unix variants, it became standardized, and today it is
 present everywhere.
 \f(CW-n\fP is ignored). Finally, there are implementations
 that implement \f(CW-c\fP and \f(CW-b\fP in a POSIX-compliant
 way.
 .PP
 Historic two-mode implementations are the ones of
-System III, System V, and the BSD ones from the beginning
+System III, System V, and the BSD ones until the mid-90s.
-until the mid-90s.
 .PP
 Pseudo multi-byte implementations are provided by GNU,
 modern NetBSD, and modern OpenBSD. The level of POSIX compliance
 that is presented there is often higher than the level of
 compliance that is actually provided. Sometimes it takes a
 Since we don't support multi-byte characters, the \f(CW-c\fP
 and \f(CW-b\fP options are equivalent, and the \f(CW-n\fP
 option is meaningless.
 .[[ http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/cut/cut.c?rev=1.18&content-type=text/x-cvsweb-markup
 .LP
-Standard-adhering implementations, ones that treat
+Standard-adhering implementations, i.e. ones that treat
-multi-byte characters correctly, are the one of the modern
+multi-byte characters correctly, are those of the modern
-FreeBSD and the one in the Heirloom toolchest. Tim Robbins
+FreeBSD and the Heirloom toolchest. Tim Robbins
 reimplemented the character mode of FreeBSD cut,
 conforming to POSIX, in the summer of 2004
 .[[ https://svnweb.freebsd.org/base?view=revision&revision=131194 .
 The question why the other BSD systems have not
 integrated this change is an open one. Maybe the answer an be
 found in the above quoted statement.
 .PP
-How does a user find out if the cut on their own system handles
+How do users find out if the cut on their own system handles
 multi-byte characters correctly? First, one needs to check if
 the system itself uses multi-byte characters, because otherwise
 characters and bytes are equivalent and the question
 is irrelevant. One can check this by looking at the locale
 settings, but it is easier to print a typical multi-byte
 implementation of the character mode requires more code, thus
 these implementations tend to be the larger ones.
 .TS
 center;
 r r r l l l.
-SLOC	Lines	Bytes	Belongs to  	File tyime	Category
+SLOC	Lines	Bytes	Belongs to  	File time	Category
 _
 116	123	 2966	System III	1980-04-11	historic
 118	125	 3038	4.3BSD-UWisc	1986-11-07	historic
 200	256	 5715	4.3BSD-Reno	1990-06-25	historic
 200	270	 6545	NetBSD	1993-03-21	historic
 382	586	14175	GNU coreutils	1992-11-08	pseudo-POSIX
 391	479	10961	FreeBSD	2012-11-24	POSIX
 588	830	23167	GNU coreutils	2015-05-01	pseudo-POSIX
 .TE
 .LP
-Roughly four groups can be seen: (1) The two original
+There are four rough groups: (1) The two original
 implementations, which are mostly identical, with about 100
 SLOC. (2) The five BSD versions, with about 200 SLOC. (3) The
 two POSIX-compliant versions and the old GNU one, with a SLOC
 count in the 300s. And finally, (4) the modern GNU cut with
 almost 600 SLOC.

Mercurial > docs > cut

comparison cut.en.ms @ 31:106609b64dc4