docs/cut: cut.en.ms annotate

docs/cut

annotate cut.en.ms @ 27:5cefcfc72d42

Added first version of the translation to English

author	markus schnalke <meillo@marmaro.de>
date	Tue, 04 Aug 2015 21:04:10 +0200
parents
children	0d7329867dd1

rev	line source
meillo@27	1 .so macros
meillo@27	2 .lc_ctype en_US.utf8
meillo@27	3 .pl -4v
meillo@27	4
meillo@27	5 .TL
meillo@27	6 Cut out selected fields of each line of a file
meillo@27	7 .AU
meillo@27	8 markus schnalke <meillo@marmaro.de>
meillo@27	9 ..
meillo@27	10 .FS
meillo@27	11 2015-05.
meillo@27	12 This text is in the public domain (CC0).
meillo@27	13 It is available online:
meillo@27	14 .I http://marmaro.de/docs/
meillo@27	15 .FE
meillo@27	16
meillo@27	17 .LP
meillo@27	18 Cut is a classic program in the Unix toolchest.
meillo@27	19 It is present in most tutorials on shell programming, because it
meillo@27	20 is such a nice and useful tool which good explanationary value.
meillo@27	21 This text shall take a look behind its surface.
meillo@27	22 .SH
meillo@27	23 Usage
meillo@27	24 .LP
meillo@27	25 Initially, cut had two operation modes, which were amended by a
meillo@27	26 third one, later. Cut may cut specified characters out of the
meillo@27	27 input lines or it may cut out specified fields, which are defined
meillo@27	28 by a delimiting character.
meillo@27	29 .PP
meillo@27	30 The character mode is well suited to slice fixed-width input
meillo@27	31 formats into parts. One might, for instance, extract the access
meillo@27	32 rights from the output of \f(CWls -l\fP, here the rights of the
meillo@27	33 file's owner:
meillo@27	34 .CS
meillo@27	35 $ ls -l foo
meillo@27	36 -rw-rw-r-- 1 meillo users 0 May 12 07:32 foo
meillo@27	37 .sp .3
meillo@27	38 $ ls -l foo \| cut -c 2-4
meillo@27	39 rw-
meillo@27	40 .CE
meillo@27	41 .LP
meillo@27	42 Or the write permission for the owner, the group and the
meillo@27	43 world:
meillo@27	44 .CS
meillo@27	45 $ ls -l foo \| cut -c 3,6,9
meillo@27	46 ww-
meillo@27	47 .CE
meillo@27	48 .LP
meillo@27	49 Cut can also be used to shorten strings:
meillo@27	50 .CS
meillo@27	51 $ long=12345678901234567890
meillo@27	52 .sp .3
meillo@27	53 $ echo "$long" \| cut -c -10
meillo@27	54 1234567890
meillo@27	55 .CE
meillo@27	56 .LP
meillo@27	57 This command outputs no more than the first 10 characters of
meillo@27	58 \f(CW$long\fP. (Alternatively, on could use \f(CWprintf
meillo@27	59 "%.10s\\n" "$long"\fP for this job.)
meillo@27	60 .PP
meillo@27	61 However, if it's not about displaying characters but about their
meillo@27	62 storing, then \f(CW-c\fP is only partly suited. In former times,
meillo@27	63 when US-ASCII had been the omnipresent character encoding, each
meillo@27	64 character was stored with exactly one byte. Therefore, \f(CWcut
meillo@27	65 -c\fP selected both, output characters and bytes, equally. With
meillo@27	66 the uprise of multi-byte encodings (like UTF-8), this assumption
meillo@27	67 became obsolete. Consequently, a byte mode (option \f(CW-b\fP)
meillo@27	68 was added to cut, with POSIX.2-1992. To select the first up to
meillo@27	69 500 bytes of each line (and ignore the rest), one can use:
meillo@27	70 .CS
meillo@27	71 $ cut -b -500
meillo@27	72 .CE
meillo@27	73 .LP
meillo@27	74 The remainder can be caught with \f(CWcut -b 501-\fP. This
meillo@27	75 possibility is important for POSIX, because it allows to create
meillo@27	76 text files with limited line length
meillo@27	77 .[[ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cut.html#tag_20_28_17 .
meillo@27	78 .PP
meillo@27	79 Although the byte mode was newly introduced, it was meant to
meillo@27	80 behave exactly as the old character mode. The character mode,
meillo@27	81 however, had to be implemented differently. In consequence,
meillo@27	82 the problem wasn't to support the byte mode, but to support the
meillo@27	83 new character mode correctly.
meillo@27	84 .PP
meillo@27	85 Besides the character and byte modes, cut has the field mode,
meillo@27	86 which is activated by \f(CW-f\fP. It selects fields from the
meillo@27	87 input. The delimiting character (by default, the tab) may be
meillo@27	88 changed using \f(CW-d\fP. It applies to the input as well as to
meillo@27	89 the output.
meillo@27	90 .PP
meillo@27	91 The typical example for the use of cut's field mode is the
meillo@27	92 selection of information from the passwd file. Here, for
meillo@27	93 instance, the username and its uid:
meillo@27	94 .CS
meillo@27	95 $ cut -d: -f1,3 /etc/passwd
meillo@27	96 root:0
meillo@27	97 bin:1
meillo@27	98 daemon:2
meillo@27	99 mail:8
meillo@27	100 ...
meillo@27	101 .CE
meillo@27	102 .LP
meillo@27	103 (The values to the command line switches may be appended directly
meillo@27	104 to them or separated by whitespace.)
meillo@27	105 .PP
meillo@27	106 The field mode is suited for simple tabulary data, like the
meillo@27	107 passwd file. Beyond that, it soon reaches its limits. Especially,
meillo@27	108 the typical case of whitespace-separated fields is covered poorly
meillo@27	109 by it. Cut's delimiter is exactly one character,
meillo@27	110 therefore one may not split at both, space and tab characters.
meillo@27	111 Furthermore, multiple adjacent delimiter characters lead to
meillo@27	112 empty fields. This is not the expected behavior for
meillo@27	113 the processing of whitespace-separated fields. Some
meillo@27	114 implementations, e.g. the one of FreeBSD, have extensions that
meillo@27	115 handle this case in the expected way. Apart from that, i.e.
meillo@27	116 if one likes to stay portable, awk comes to rescue.
meillo@27	117 .PP
meillo@27	118 Awk provides another function that cut misses: Changing the order
meillo@27	119 of the fields in the output. For cut, the order of the field
meillo@27	120 selection specification is irrelevant; it doesn't even matter if
meillo@27	121 fields are given multiple times. Thus, the invocation
meillo@27	122 \f(CWcut -c 5-8,1,4-6\fP outputs the characters number
meillo@27	123 1, 4, 5, 6, 7 and 8 in exactly this order. The
meillo@27	124 selection is like in the mathematical set theory: Each
meillo@27	125 specified field is part of the solution set. The fields in the
meillo@27	126 solution set are always in the same order as in the input. To
meillo@27	127 speak with the words of the man page in Version 8 Unix:
meillo@27	128 ``In data base parlance, it projects a relation.''
meillo@27	129 .[[ http://man.cat-v.org/unix_8th/1/cut
meillo@27	130 This means, cut applies the database operation \fIprojection\fP
meillo@27	131 to the text input. Wikipedia explains it in the following way:
meillo@27	132 ``In practical terms, it can be roughly thought of as picking a
meillo@27	133 sub-set of all available columns.''
meillo@27	134 .[[ https://en.wikipedia.org/wiki/Projection_(relational_algebra)
meillo@27	135
meillo@27	136 .SH
meillo@27	137 Historical Background
meillo@27	138 .LP
meillo@27	139 Cut came to public life in 1982 with the release of UNIX System
meillo@27	140 III. Browsing through the sources of System III, one finds cut.c
meillo@27	141 with the timestamp 1980-04-11
meillo@27	142 .[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=SysIII/usr/src/cmd .
meillo@27	143 This is the oldest implementation of the program, I was able to
meillo@27	144 discover. However, the SCCS-ID in the source code speaks of
meillo@27	145 version 1.5. According to Doug McIlroy
meillo@27	146 .[[ http://minnie.tuhs.org/pipermail/tuhs/2015-May/004083.html ,
meillo@27	147 the earlier history likely lays in PWB/UNIX, which was the
meillo@27	148 basis for System III. In the available sources of PWB 1.0 (1977)
meillo@27	149 .[[ http://minnie.tuhs.org/Archive/PDP-11/Distributions/usdl/ ,
meillo@27	150 no cut is present. Of PWB 2.0, no sources or useful documentation
meillo@27	151 seem to be available. PWB 3.0 was later renamed to System III
meillo@27	152 for marketing purposes, hence it is identical to it. A side line
meillo@27	153 of PWB was CB UNIX, which was only used in the Bell Labs
meillo@27	154 internally. The manual of CB UNIX Edition 2.1 of November 1979
meillo@27	155 contains the earliest mentioning of cut, that my research brought
meillo@27	156 to light: A man page for it
meillo@27	157 .[[ ftp://sunsite.icm.edu.pl/pub/unix/UnixArchive/PDP-11/Distributions/other/CB_Unix/cbunix_man1_02.pdf .
meillo@27	158 .PP
meillo@27	159 Now a look on BSD: There, my earliest discovery is a cut.c with
meillo@27	160 the file modification date of 1986-11-07
meillo@27	161 .[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-UWisc/src/usr.bin/cut
meillo@27	162 as part of the special version 4.3BSD-UWisc
meillo@27	163 .[[ http://gunkies.org/wiki/4.3_BSD_NFS_Wisconsin_Unix ,
meillo@27	164 which was released in January 1987.
meillo@27	165 This implementation is mostly identical to the one in System
meillo@27	166 III. The better known 4.3BSD-Tahoe (1988) does not contain cut.
meillo@27	167 The following 4.3BSD-Reno (1990) does include cut. It is a freshly
meillo@27	168 written one by Adam S. Moskowitz and Marciano Pitargue, which was
meillo@27	169 included in BSD in 1989
meillo@27	170 .[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-Reno/src/usr.bin/cut .
meillo@27	171 Its man page
meillo@27	172 .[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-Reno/src/usr.bin/cut/cut.1
meillo@27	173 already mentions the expected compliance to POSIX.2.
meillo@27	174 One should note that POSIX.2 was first published in
meillo@27	175 September 1992, about two years after the man page and the
meillo@27	176 program were written. Hence, the program must have been
meillo@27	177 implemented based on a draft version of the standard. A look into
meillo@27	178 the code confirms the assumption. The function to parse the field
meillo@27	179 selection includes the following comment:
meillo@27	180 .QP
meillo@27	181 This parser is less restrictive than the Draft 9 POSIX spec.
meillo@27	182 POSIX doesn't allow lists that aren't in increasing order or
meillo@27	183 overlapping lists.
meillo@27	184 .LP
meillo@27	185 Draft 11.2 of POSIX (1991-09) requires this flexibility already:
meillo@27	186 .QP
meillo@27	187 The elements in list can be repeated, can overlap, and can
meillo@27	188 be specified in any order.
meillo@27	189 .LP
meillo@27	190 The same draft additionally includes all three operation modes,
meillo@27	191 whereas this early BSD cut only implemented the original two.
meillo@27	192 Draft 9 might not have included the byte mode. Without access to
meillo@27	193 Draft 9 or 10, it wasn't possible to verify this guess.
meillo@27	194 .PP
meillo@27	195 The version numbers and change dates of the older BSD
meillo@27	196 implementations are manifested in the SCCS-IDs, which the
meillo@27	197 version control system of that time inserted. For instance
meillo@27	198 in 4.3BSD-Reno: ``5.3 (Berkeley) 6/24/90''.
meillo@27	199 .PP
meillo@27	200 The cut implementation of the GNU coreutils contains the
meillo@27	201 following copyright notice:
meillo@27	202 .CS
meillo@27	203 Copyright (C) 1997-2015 Free Software Foundation, Inc.
meillo@27	204 Copyright (C) 1984 David M. Ihnat
meillo@27	205 .CE
meillo@27	206 .LP
meillo@27	207 The code does have pretty old origins. Further comments show that
meillo@27	208 the source code was reworked by David MacKenzie first and later
meillo@27	209 by Jim Meyering, who put it into the version control system in
meillo@27	210 1992. It is unclear, why the years until 1997, at least from
meillo@27	211 1992 on, don't show up in the copyright notice.
meillo@27	212 .PP
meillo@27	213 Despite all those year numbers from the 80s, cut is a rather
meillo@27	214 young tool, at least in relation to the early Unix. Despite
meillo@27	215 being a decade older than Linux, the kernel, Unix had been
meillo@27	216 present for over ten years until cut appeared for the first
meillo@27	217 time. Most notably, cut wasn't part of Version 7 Unix, which
meillo@27	218 became the basis for all modern Unix systems. The more complex
meillo@27	219 tools sed and awk had been part of it already. Hence, the
meillo@27	220 question comes to mind, why cut was written at all, as there
meillo@27	221 existed two programs that were able to cover the use cases of
meillo@27	222 cut. On reason for cut surely was its compactness and the
meillo@27	223 resulting speed, in comparison to the then bulky awk. This lean
meillo@27	224 shape goes well with the Unix philosopy: Do one job and do it
meillo@27	225 well! Cut convinced. It found it's way to other Unix variants,
meillo@27	226 it became standardized and today it is present everywhere.
meillo@27	227 .PP
meillo@27	228 The original variant (without \f(CW-b\fP) was described by the
meillo@27	229 System V Interface Defintion, an important formal description
meillo@27	230 of UNIX System V, already in 1985. In the following years, it
meillo@27	231 appeared in all relevant standards. POSIX.2 in 1992 specified
meillo@27	232 cut for the first time in its modern form (with \f(CW-b\fP).
meillo@27	233
meillo@27	234 .SH
meillo@27	235 Multi-byte support
meillo@27	236 .LP
meillo@27	237 The byte mode and thus the multi-byte support of
meillo@27	238 the POSIX character mode are standardized since 1992. But
meillo@27	239 how about their presence in the available implementations?
meillo@27	240 Which versions do implement POSIX correctly?
meillo@27	241 .PP
meillo@27	242 The situation is divided in three parts: There are historic
meillo@27	243 implementations, which have only \f(CW-c\fP and \f(CW-f\fP.
meillo@27	244 Then there are implementations, which have \f(CW-b\fP but
meillo@27	245 treat it as an alias for \f(CW-c\fP only. These
meillo@27	246 implementations work correctly for single-byte encodings
meillo@27	247 (e.g. US-ASCII, Latin1) but for multi-byte encodings (e.g.
meillo@27	248 UTF-8) their \f(CW-c\fP behaves like \f(CW-b\fP (and
meillo@27	249 \f(CW-n\fP is ignored). Finally, there are implementations
meillo@27	250 that implement \f(CW-b\fP and \f(CW-c\fP POSIX-compliant.
meillo@27	251 .PP
meillo@27	252 Historic two-mode implementations are the ones of
meillo@27	253 System III, System V and the BSD ones until the mid-90s.
meillo@27	254 .PP
meillo@27	255 Pseudo multi-byte implementations are provided by GNU and
meillo@27	256 modern NetBSD and OpenBSD. The level of POSIX compliance
meillo@27	257 that is presented there is often higher than the level of
meillo@27	258 compliance that is actually provided. Sometimes it takes a
meillo@27	259 close look to discover that \f(CW-c\fP and \f(CW-n\fP don't
meillo@27	260 behave as expected. Some of the implementations take the
meillo@27	261 easy way by simply being ignorant to any multi-byte
meillo@27	262 encodings, at least they tell that clearly:
meillo@27	263 .QP
meillo@27	264 Since we don't support multi-byte characters, the \f(CW-c\fP and \f(CW-b\fP
meillo@27	265 options are equivalent, and the \f(CW-n\fP option is meaningless.
meillo@27	266 .[[ http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/cut/cut.c?rev=1.18&content-type=text/x-cvsweb-markup
meillo@27	267 .LP
meillo@27	268 Standard-adhering implementations, ones that treat
meillo@27	269 multi-byte characters correctly, are the one of the modern
meillo@27	270 FreeBSD and the one in the Heirloom toolchest. Tim Robbins
meillo@27	271 reimplemented the character mode of FreeBSD cut,
meillo@27	272 conforming to POSIX, in summer 2004
meillo@27	273 .[[ https://svnweb.freebsd.org/base?view=revision&revision=131194 .
meillo@27	274 The question, why the other BSD systems have not
meillo@27	275 integrated this change, is an open one. Maybe the answer an be
meillo@27	276 found in the above quoted statement.
meillo@27	277 .PP
meillo@27	278 How does a user find out if the cut on the own system handles
meillo@27	279 multi-byte characters correclty? First, one needs to check if
meillo@27	280 the system itself uses multi-byte characters, because otherwise
meillo@27	281 characters and bytes are equivalent and the question
meillo@27	282 is irrelevant. One can check this by looking at the locale
meillo@27	283 settings, but it is easier to print a typical multi-byte
meillo@27	284 character, for instance an Umlaut or the Euro currency
meillo@27	285 symbol, and check if one or more bytes are output:
meillo@27	286 .CS
meillo@27	287 $ echo ä \| od -c
meillo@27	288 0000000 303 244 \\n
meillo@27	289 0000003
meillo@27	290 .CE
meillo@27	291 .LP
meillo@27	292 In this case it were two bytes: octal 303 and 244. (The
meillo@27	293 Newline character is added by echo.)
meillo@27	294 .PP
meillo@27	295 The program iconv converts text to specific encodings. This
meillo@27	296 is the output for Latin1 and UTF-8, for comparison:
meillo@27	297 .CS
meillo@27	298 $ echo ä \| iconv -t latin1 \| od -c
meillo@27	299 0000000 344 \\n
meillo@27	300 0000002
meillo@27	301 .sp .3
meillo@27	302 $ echo ä \| iconv -t utf8 \| od -c
meillo@27	303 0000000 303 244 \\n
meillo@27	304 0000003
meillo@27	305 .CE
meillo@27	306 .LP
meillo@27	307 The output (without the iconv conversion) on many European
meillo@27	308 systems equals one of these two.
meillo@27	309 .PP
meillo@27	310 Now the test of the cut implementation. On a UTF-8 system, a
meillo@27	311 POSIX compliant implementation behaves as such:
meillo@27	312 .CS
meillo@27	313 $ echo ä \| cut -c 1 \| od -c
meillo@27	314 0000000 303 244 \\n
meillo@27	315 0000003
meillo@27	316 .sp .3
meillo@27	317 $ echo ä \| cut -b 1 \| od -c
meillo@27	318 0000000 303 \\n
meillo@27	319 0000002
meillo@27	320 .sp .3
meillo@27	321 $ echo ä \| cut -b 1 -n \| od -c
meillo@27	322 0000000 \\n
meillo@27	323 0000001
meillo@27	324 .CE
meillo@27	325 .LP
meillo@27	326 A pseudo POSIX implementation, in contrast, behaves like the
meillo@27	327 middle one, for all three invocations: Only the first byte is
meillo@27	328 output.
meillo@27	329
meillo@27	330 .SH
meillo@27	331 Implementations
meillo@27	332 .LP
meillo@27	333 Let's take a look at the sources of a selection of
meillo@27	334 implementations.
meillo@27	335 .PP
meillo@27	336 A comparison of the amount of source code is good to get a first
meillo@27	337 impression. Typically, it grows through time. This can be seen
meillo@27	338 here, in general but not in all cases. A POSIX-compliant
meillo@27	339 implementation of the character mode requires more code, thus
meillo@27	340 these implementations are rather the larger ones.
meillo@27	341 .TS
meillo@27	342 center;
meillo@27	343 r r r l l l.
meillo@27	344 SLOC Lines Bytes Belongs to File tyime Category
meillo@27	345 _
meillo@27	346 116 123 2966 System III 1980-04-11 historic
meillo@27	347 118 125 3038 4.3BSD-UWisc 1986-11-07 historic
meillo@27	348 200 256 5715 4.3BSD-Reno 1990-06-25 historic
meillo@27	349 200 270 6545 NetBSD 1993-03-21 historic
meillo@27	350 218 290 6892 OpenBSD 2008-06-27 pseudo-POSIX
meillo@27	351 224 296 6920 FreeBSD 1994-05-27 historic
meillo@27	352 232 306 7500 NetBSD 2014-02-03 pseudo-POSIX
meillo@27	353 340 405 7423 Heirloom 2012-05-20 POSIX
meillo@27	354 382 586 14175 GNU coreutils 1992-11-08 pseudo-POSIX
meillo@27	355 391 479 10961 FreeBSD 2012-11-24 POSIX
meillo@27	356 588 830 23167 GNU coreutils 2015-05-01 pseudo-POSIX
meillo@27	357 .TE
meillo@27	358 .LP
meillo@27	359 Roughly four groups can be seen: (1) The two original
meillo@27	360 implementaions, which are mostly identical, with about 100
meillo@27	361 SLOC. (2) The five BSD versions, with about 200 SLOC. (3) The
meillo@27	362 two POSIX-compliant versions and the old GNU one, with a SLOC
meillo@27	363 count in the 300s. And finally (4) the modern GNU cut with
meillo@27	364 almost 600 SLOC.
meillo@27	365 .PP
meillo@27	366 The variation between the number of logical code
meillo@27	367 lines (SLOC, meassured with SLOCcount) and the number of
meillo@27	368 Newlines in the file (\f(CWwc -l\fP) spans between factor
meillo@27	369 1.06 for the oldest versions and factor 1.5 for GNU. The
meillo@27	370 largest influence on it are empty lines, pure comment lines
meillo@27	371 and the size of the license block at the beginning of the file.
meillo@27	372 .PP
meillo@27	373 Regarding the variation between logical code lines and the
meillo@27	374 file size (\f(CWwc -c\fP), the implementations span between
meillo@27	375 25 and 30 bytes per statement. With only 21 bytes per
meillo@27	376 statement, the Heirloom implementation marks the lower end;
meillo@27	377 the GNU implementation sets the upper limit at nearly 40. In
meillo@27	378 the case of GNU, the reason is mainly their coding style, with
meillo@27	379 special indent rules and long identifiers. Whether one finds
meillo@27	380 the Heirloom implementation
meillo@27	381 .[[ http://heirloom.cvs.sourceforge.net/viewvc/heirloom/heirloom/cut/cut.c?revision=1.6&view=markup
meillo@27	382 highly cryptic or exceptionally elegant, shall be left
meillo@27	383 open to the judgement of the reader. Especially the
meillo@27	384 comparison to the GNU implementation
meillo@27	385 .[[ http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/cut.c;hb=e981643
meillo@27	386 is impressive.
meillo@27	387 .PP
meillo@27	388 The internal structure of the source code (in all cases it is
meillo@27	389 written in C) is mainly similar. Besides the mandatory main
meillo@27	390 function, which does the command line argument processing,
meillo@27	391 there usually exists a function to convert the field
meillo@27	392 selection specification to an internal data structure.
meillo@27	393 Further more, almost all implementations have separate
meillo@27	394 functions for each of their operation modes. The POSIX-compliant
meillo@27	395 versions treat the \f(CW-b -n\fP combination as a separate
meillo@27	396 mode and thus implement it in an own function. Only the early
meillo@27	397 System III implementation (and its 4.3BSD-UWisc variant) do
meillo@27	398 everything, apart from error handling, in the main function.
meillo@27	399 .PP
meillo@27	400 Implementations of cut typically have two limiting aspects:
meillo@27	401 One being the maximum number of fields that can be handled,
meillo@27	402 the other being the maximum line length. On System III, both
meillo@27	403 numbers are limited to 512. 4.3BSD-Reno and the BSDs of the
meillo@27	404 90s have fixed limits as well (\f(CW_BSD_LINE_MAX\fP or
meillo@27	405 \f(CW_POSIX2_LINE_MAX\fP). Modern FreeBSD, NetBSD, all GNU
meillo@27	406 implementations and the Heirloom cut is able to handle
meillo@27	407 arbitrary numbers of fields and line lengths \(en the memory
meillo@27	408 is allocated dynamically. OpenBSD cut is a hybrid: It has a fixed
meillo@27	409 maximum number of fields, but allows arbitrary line lengths.
meillo@27	410 The limited number of fields does, however, not appear to be
meillo@27	411 any practical problem, because \f(CW_POSIX2_LINE_MAX\fP is
meillo@27	412 guaranteed to be at least 2048 and is thus probably large enough.
meillo@27	413
meillo@27	414 .SH
meillo@27	415 Descriptions
meillo@27	416 .LP
meillo@27	417 Interesting, as well, is a comparison of the short descriptions
meillo@27	418 of cut, as can be found in the headlines of the man
meillo@27	419 pages or at the beginning of the source code files.
meillo@27	420 The following list is roughly sorted by time and grouped by
meillo@27	421 decent:
meillo@27	422 .TS
meillo@27	423 center;
meillo@27	424 l l.
meillo@27	425 CB UNIX cut out selected fields of each line of a file
meillo@27	426 System III cut out selected fields of each line of a file
meillo@27	427 System III \(dg cut and paste columns of a table (projection of a relation)
meillo@27	428 System V cut out selected fields of each line of a file
meillo@27	429 HP-UX cut out (extract) selected fields of each line of a file
meillo@27	430 .sp .3
meillo@27	431 4.3BSD-UWisc \(dg cut and paste columns of a table (projection of a relation)
meillo@27	432 4.3BSD-Reno select portions of each line of a file
meillo@27	433 NetBSD select portions of each line of a file
meillo@27	434 OpenBSD 4.6 select portions of each line of a file
meillo@27	435 FreeBSD 1.0 select portions of each line of a file
meillo@27	436 FreeBSD 10.0 cut out selected portions of each line of a file
meillo@27	437 SunOS 4.1.3 remove selected fields from each line of a file
meillo@27	438 SunOS 5.5.1 cut out selected fields of each line of a file
meillo@27	439 .sp .3
meillo@27	440 Heirloom Tools cut out selected fields of each line of a file
meillo@27	441 Heirloom Tools \(dg cut out fields of lines of files
meillo@27	442 .sp .3
meillo@27	443 GNU coreutils remove sections from each line of files
meillo@27	444 .sp .3
meillo@27	445 Minix select out columns of a file
meillo@27	446 .sp .3
meillo@27	447 Version 8 Unix rearrange columns of data
meillo@27	448 ``Unix Reader'' rearrange columns of text
meillo@27	449 .sp .3
meillo@27	450 POSIX cut out selected fields of each line of a file
meillo@27	451 .TE
meillo@27	452 .LP
meillo@27	453 (The descriptions that are marked with `\(dg' were taken from
meillo@27	454 source code files. The POSIX entry contains the description
meillo@27	455 used in the standard. The ``Unix Reader'' is a retrospective
meillo@27	456 document by Doug McIlroy, which lists the availability of
meillo@27	457 tools in the Research Unix versions
meillo@27	458 .[[ http://doc.cat-v.org/unix/unix-reader/contents.pdf .
meillo@27	459 Its description should actually match the one in Version 8
meillo@27	460 Unix. The change could be a transfer mistake or a correction.
meillo@27	461 All other descriptions originate from the various man pages.)
meillo@27	462 .PP
meillo@27	463 Over time, the POSIX description was often adopted or it
meillo@27	464 served as inspiration. One such example is FreeBSD
meillo@27	465 .[[ https://svnweb.freebsd.org/base?view=revision&revision=167101 .
meillo@27	466 .PP
meillo@27	467 It is noteworthy that the GNU coreutils in all versions
meillo@27	468 describe the performed action as a removal of parts of the
meillo@27	469 input, although the user clearly selects the parts that are
meillo@27	470 output. Probably the words ``cut out'' are too misleading.
meillo@27	471 HP-UX concretized them.
meillo@27	472 .PP
meillo@27	473 There are also different terms used for the thing being
meillo@27	474 selected. Some talk about fields (POSIX), some talk
meillo@27	475 about portions (BSD) and some call it columns (Research
meillo@27	476 Unix).
meillo@27	477 .PP
meillo@27	478 The seemingly least adequate description, the one of Version
meillo@27	479 8 Unix (``rearrange columns of data'') is explainable in so
meillo@27	480 far that the man page covers both, cut and paste, and in
meillo@27	481 their combination, columns can be rearranged. The use of
meillo@27	482 ``data'' instead of ``text'' might be a lapse, which McIlroy
meillo@27	483 corrected in his Unix Reader ... but, on the other hand, on
meillo@27	484 Unix, the two words are mostly synonymous, because all data
meillo@27	485 is text.
meillo@27	486
meillo@27	487
meillo@27	488 .SH
meillo@27	489 Referenzen
meillo@27	490 .LP
meillo@27	491 .nf
meillo@27	492 ._r
meillo@27	493