docs/cut

diff cut.en.ms @ 27:5cefcfc72d42
Added first version of the translation to English
author: markus schnalke <meillo@marmaro.de>
date: Tue, 04 Aug 2015 21:04:10 +0200
children: 0d7329867dd1
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/cut.en.ms	Tue Aug 04 21:04:10 2015 +0200
     1.3 @@ -0,0 +1,493 @@
     1.4 +.so macros
     1.5 +.lc_ctype en_US.utf8
     1.6 +.pl -4v
     1.7 +
     1.8 +.TL
     1.9 +Cut out selected fields of each line of a file
    1.10 +.AU
    1.11 +markus schnalke <meillo@marmaro.de>
    1.12 +..
    1.13 +.FS
    1.14 +2015-05.
    1.15 +This text is in the public domain (CC0).
    1.16 +It is available online:
    1.17 +.I http://marmaro.de/docs/
    1.18 +.FE
    1.19 +
    1.20 +.LP
    1.21 +Cut is a classic program in the Unix toolchest.
    1.22 +It is present in most tutorials on shell programming, because it
    1.23 +is such a nice and useful tool which good explanationary value.
    1.24 +This text shall take a look behind its surface.
    1.25 +.SH
    1.26 +Usage
    1.27 +.LP
    1.28 +Initially, cut had two operation modes, which were amended by a
    1.29 +third one, later. Cut may cut specified characters out of the
    1.30 +input lines or it may cut out specified fields, which are defined
    1.31 +by a delimiting character.
    1.32 +.PP
    1.33 +The character mode is well suited to slice fixed-width input
    1.34 +formats into parts. One might, for instance, extract the access
    1.35 +rights from the output of \f(CWls -l\fP, here the rights of the
    1.36 +file's owner:
    1.37 +.CS
    1.38 +	$ ls -l foo
    1.39 +	-rw-rw-r-- 1 meillo users 0 May 12 07:32 foo
    1.40 +.sp .3
    1.41 +	$ ls -l foo | cut -c 2-4
    1.42 +	rw-
    1.43 +.CE
    1.44 +.LP
    1.45 +Or the write permission for the owner, the group and the
    1.46 +world:
    1.47 +.CS
    1.48 +	$ ls -l foo | cut -c 3,6,9
    1.49 +	ww-
    1.50 +.CE
    1.51 +.LP
    1.52 +Cut can also be used to shorten strings:
    1.53 +.CS
    1.54 +	$ long=12345678901234567890
    1.55 +.sp .3
    1.56 +	$ echo "$long" | cut -c -10
    1.57 +	1234567890
    1.58 +.CE
    1.59 +.LP
    1.60 +This command outputs no more than the first 10 characters of
    1.61 +\f(CW$long\fP. (Alternatively, on could use \f(CWprintf
    1.62 +"%.10s\\n" "$long"\fP for this job.)
    1.63 +.PP
    1.64 +However, if it's not about displaying characters but about their
    1.65 +storing, then \f(CW-c\fP is only partly suited. In former times,
    1.66 +when US-ASCII had been the omnipresent character encoding, each
    1.67 +character was stored with exactly one byte. Therefore, \f(CWcut
    1.68 +-c\fP selected both, output characters and bytes, equally. With
    1.69 +the uprise of multi-byte encodings (like UTF-8), this assumption
    1.70 +became obsolete. Consequently, a byte mode (option \f(CW-b\fP)
    1.71 +was added to cut, with POSIX.2-1992. To select the first up to
    1.72 +500 bytes of each line (and ignore the rest), one can use:
    1.73 +.CS
    1.74 +	$ cut -b -500
    1.75 +.CE
    1.76 +.LP
    1.77 +The remainder can be caught with \f(CWcut -b 501-\fP. This
    1.78 +possibility is important for POSIX, because it allows to create
    1.79 +text files with limited line length
    1.80 +.[[ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cut.html#tag_20_28_17 .
    1.81 +.PP
    1.82 +Although the byte mode was newly introduced, it was meant to
    1.83 +behave exactly as the old character mode. The character mode,
    1.84 +however, had to be implemented differently. In consequence,
    1.85 +the problem wasn't to support the byte mode, but to support the
    1.86 +new character mode correctly.
    1.87 +.PP
    1.88 +Besides the character and byte modes, cut has the field mode,
    1.89 +which is activated by \f(CW-f\fP. It selects fields from the
    1.90 +input. The delimiting character (by default, the tab) may be
    1.91 +changed using \f(CW-d\fP. It applies to the input as well as to
    1.92 +the output.
    1.93 +.PP
    1.94 +The typical example for the use of cut's field mode is the
    1.95 +selection of information from the passwd file. Here, for
    1.96 +instance, the username and its uid:
    1.97 +.CS
    1.98 +	$ cut -d: -f1,3 /etc/passwd
    1.99 +	root:0
   1.100 +	bin:1
   1.101 +	daemon:2
   1.102 +	mail:8
   1.103 +	...
   1.104 +.CE
   1.105 +.LP
   1.106 +(The values to the command line switches may be appended directly
   1.107 +to them or separated by whitespace.)
   1.108 +.PP
   1.109 +The field mode is suited for simple tabulary data, like the
   1.110 +passwd file. Beyond that, it soon reaches its limits. Especially,
   1.111 +the typical case of whitespace-separated fields is covered poorly
   1.112 +by it. Cut's delimiter is exactly one character,
   1.113 +therefore one may not split at both, space and tab characters.
   1.114 +Furthermore, multiple adjacent delimiter characters lead to
   1.115 +empty fields. This is not the expected behavior for
   1.116 +the processing of whitespace-separated fields. Some
   1.117 +implementations, e.g. the one of FreeBSD, have extensions that
   1.118 +handle this case in the expected way. Apart from that, i.e.
   1.119 +if one likes to stay portable, awk comes to rescue.
   1.120 +.PP
   1.121 +Awk provides another function that cut misses: Changing the order
   1.122 +of the fields in the output. For cut, the order of the field
   1.123 +selection specification is irrelevant; it doesn't even matter if
   1.124 +fields are given multiple times. Thus, the invocation
   1.125 +\f(CWcut -c 5-8,1,4-6\fP outputs the characters number
   1.126 +1, 4, 5, 6, 7 and 8 in exactly this order. The
   1.127 +selection is like in the mathematical set theory: Each
   1.128 +specified field is part of the solution set. The fields in the
   1.129 +solution set are always in the same order as in the input. To
   1.130 +speak with the words of the man page in Version 8 Unix:
   1.131 +``In data base parlance, it projects a relation.''
   1.132 +.[[ http://man.cat-v.org/unix_8th/1/cut
   1.133 +This means, cut applies the database operation \fIprojection\fP
   1.134 +to the text input. Wikipedia explains it in the following way:
   1.135 +``In practical terms, it can be roughly thought of as picking a
   1.136 +sub-set of all available columns.''
   1.137 +.[[ https://en.wikipedia.org/wiki/Projection_(relational_algebra)
   1.138 +
   1.139 +.SH
   1.140 +Historical Background
   1.141 +.LP
   1.142 +Cut came to public life in 1982 with the release of UNIX System
   1.143 +III. Browsing through the sources of System III, one finds cut.c
   1.144 +with the timestamp 1980-04-11
   1.145 +.[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=SysIII/usr/src/cmd .
   1.146 +This is the oldest implementation of the program, I was able to
   1.147 +discover. However, the SCCS-ID in the source code speaks of
   1.148 +version 1.5. According to Doug McIlroy
   1.149 +.[[ http://minnie.tuhs.org/pipermail/tuhs/2015-May/004083.html ,
   1.150 +the earlier history likely lays in PWB/UNIX, which was the
   1.151 +basis for System III. In the available sources of PWB 1.0 (1977)
   1.152 +.[[ http://minnie.tuhs.org/Archive/PDP-11/Distributions/usdl/ ,
   1.153 +no cut is present. Of PWB 2.0, no sources or useful documentation
   1.154 +seem to be available. PWB 3.0 was later renamed to System III
   1.155 +for marketing purposes, hence it is identical to it. A side line
   1.156 +of PWB was CB UNIX, which was only used in the Bell Labs
   1.157 +internally. The manual of CB UNIX Edition 2.1 of November 1979
   1.158 +contains the earliest mentioning of cut, that my research brought
   1.159 +to light: A man page for it
   1.160 +.[[ ftp://sunsite.icm.edu.pl/pub/unix/UnixArchive/PDP-11/Distributions/other/CB_Unix/cbunix_man1_02.pdf .
   1.161 +.PP
   1.162 +Now a look on BSD: There, my earliest discovery is a cut.c with
   1.163 +the file modification date of 1986-11-07
   1.164 +.[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-UWisc/src/usr.bin/cut
   1.165 +as part of the special version 4.3BSD-UWisc
   1.166 +.[[ http://gunkies.org/wiki/4.3_BSD_NFS_Wisconsin_Unix ,
   1.167 +which was released in January 1987.
   1.168 +This implementation is mostly identical to the one in System
   1.169 +III. The better known 4.3BSD-Tahoe (1988) does not contain cut.
   1.170 +The following 4.3BSD-Reno (1990) does include cut. It is a freshly
   1.171 +written one by Adam S. Moskowitz and Marciano Pitargue, which was
   1.172 +included in BSD in 1989
   1.173 +.[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-Reno/src/usr.bin/cut .
   1.174 +Its man page
   1.175 +.[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-Reno/src/usr.bin/cut/cut.1
   1.176 +already mentions the expected compliance to POSIX.2.
   1.177 +One should note that POSIX.2 was first published in
   1.178 +September 1992, about two years after the man page and the
   1.179 +program were written. Hence, the program must have been
   1.180 +implemented based on a draft version of the standard. A look into
   1.181 +the code confirms the assumption. The function to parse the field
   1.182 +selection includes the following comment:
   1.183 +.QP
   1.184 +This parser is less restrictive than the Draft 9 POSIX spec.
   1.185 +POSIX doesn't allow lists that aren't in increasing order or
   1.186 +overlapping lists.
   1.187 +.LP
   1.188 +Draft 11.2 of POSIX (1991-09) requires this flexibility already:
   1.189 +.QP
   1.190 +The elements in list can be repeated, can overlap, and can
   1.191 +be specified in any order.
   1.192 +.LP
   1.193 +The same draft additionally includes all three operation modes,
   1.194 +whereas this early BSD cut only implemented the original two.
   1.195 +Draft 9 might not have included the byte mode. Without access to
   1.196 +Draft 9 or 10, it wasn't possible to verify this guess.
   1.197 +.PP
   1.198 +The version numbers and change dates of the older BSD
   1.199 +implementations are manifested in the SCCS-IDs, which the
   1.200 +version control system of that time inserted. For instance
   1.201 +in 4.3BSD-Reno: ``5.3 (Berkeley) 6/24/90''.
   1.202 +.PP
   1.203 +The cut implementation of the GNU coreutils contains the
   1.204 +following copyright notice:
   1.205 +.CS
   1.206 +	Copyright (C) 1997-2015 Free Software Foundation, Inc.
   1.207 +	Copyright (C) 1984 David M. Ihnat
   1.208 +.CE
   1.209 +.LP
   1.210 +The code does have pretty old origins. Further comments show that
   1.211 +the source code was reworked by David MacKenzie first and later
   1.212 +by Jim Meyering, who put it into the version control system in
   1.213 +1992. It is unclear, why the years until 1997, at least from
   1.214 +1992 on, don't show up in the copyright notice.
   1.215 +.PP
   1.216 +Despite all those year numbers from the 80s, cut is a rather
   1.217 +young tool, at least in relation to the early Unix. Despite
   1.218 +being a decade older than Linux, the kernel, Unix had been
   1.219 +present for over ten years until cut appeared for the first
   1.220 +time. Most notably, cut wasn't part of Version 7 Unix, which
   1.221 +became the basis for all modern Unix systems. The more complex
   1.222 +tools sed and awk had been part of it already. Hence, the
   1.223 +question comes to mind, why cut was written at all, as there
   1.224 +existed two programs that were able to cover the use cases of
   1.225 +cut. On reason for cut surely was its compactness and the
   1.226 +resulting speed, in comparison to the then bulky awk. This lean
   1.227 +shape goes well with the Unix philosopy: Do one job and do it
   1.228 +well! Cut convinced. It found it's way to other Unix variants,
   1.229 +it became standardized and today it is present everywhere.
   1.230 +.PP
   1.231 +The original variant (without \f(CW-b\fP) was described by the
   1.232 +System V Interface Defintion, an important formal description
   1.233 +of UNIX System V, already in 1985. In the following years, it
   1.234 +appeared in all relevant standards. POSIX.2 in 1992 specified
   1.235 +cut for the first time in its modern form (with \f(CW-b\fP).
   1.236 +
   1.237 +.SH
   1.238 +Multi-byte support
   1.239 +.LP
   1.240 +The byte mode and thus the multi-byte support of
   1.241 +the POSIX character mode are standardized since 1992. But
   1.242 +how about their presence in the available implementations?
   1.243 +Which versions do implement POSIX correctly?
   1.244 +.PP
   1.245 +The situation is divided in three parts: There are historic
   1.246 +implementations, which have only \f(CW-c\fP and \f(CW-f\fP.
   1.247 +Then there are implementations, which have \f(CW-b\fP but
   1.248 +treat it as an alias for \f(CW-c\fP only. These
   1.249 +implementations work correctly for single-byte encodings
   1.250 +(e.g. US-ASCII, Latin1) but for multi-byte encodings (e.g.
   1.251 +UTF-8) their \f(CW-c\fP behaves like \f(CW-b\fP (and
   1.252 +\f(CW-n\fP is ignored). Finally, there are implementations
   1.253 +that implement \f(CW-b\fP and \f(CW-c\fP POSIX-compliant.
   1.254 +.PP
   1.255 +Historic two-mode implementations are the ones of 
   1.256 +System III, System V and the BSD ones until the mid-90s.
   1.257 +.PP
   1.258 +Pseudo multi-byte implementations are provided by GNU and
   1.259 +modern NetBSD and OpenBSD. The level of POSIX compliance
   1.260 +that is presented there is often higher than the level of
   1.261 +compliance that is actually provided. Sometimes it takes a
   1.262 +close look to discover that \f(CW-c\fP and \f(CW-n\fP don't
   1.263 +behave as expected. Some of the implementations take the
   1.264 +easy way by simply being ignorant to any multi-byte
   1.265 +encodings, at least they tell that clearly:
   1.266 +.QP
   1.267 +Since we don't support multi-byte characters, the \f(CW-c\fP and \f(CW-b\fP
   1.268 +options are equivalent, and the \f(CW-n\fP option is meaningless.
   1.269 +.[[ http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/cut/cut.c?rev=1.18&content-type=text/x-cvsweb-markup
   1.270 +.LP
   1.271 +Standard-adhering implementations, ones that treat
   1.272 +multi-byte characters correctly, are the one of the modern
   1.273 +FreeBSD and the one in the Heirloom toolchest. Tim Robbins
   1.274 +reimplemented the character mode of FreeBSD cut,
   1.275 +conforming to POSIX, in summer 2004
   1.276 +.[[ https://svnweb.freebsd.org/base?view=revision&revision=131194 .
   1.277 +The question, why the other BSD systems have not
   1.278 +integrated this change, is an open one. Maybe the answer an be
   1.279 +found in the above quoted statement.
   1.280 +.PP
   1.281 +How does a user find out if the cut on the own system handles
   1.282 +multi-byte characters correclty? First, one needs to check if
   1.283 +the system itself uses multi-byte characters, because otherwise
   1.284 +characters and bytes are equivalent and the question
   1.285 +is irrelevant. One can check this by looking at the locale
   1.286 +settings, but it is easier to print a typical multi-byte
   1.287 +character, for instance an Umlaut or the Euro currency
   1.288 +symbol, and check if one or more bytes are output:
   1.289 +.CS
   1.290 +	$ echo ä | od -c
   1.291 +	0000000 303 244  \\n
   1.292 +	0000003
   1.293 +.CE
   1.294 +.LP
   1.295 +In this case it were two bytes: octal 303 and 244. (The
   1.296 +Newline character is added by echo.)
   1.297 +.PP
   1.298 +The program iconv converts text to specific encodings. This
   1.299 +is the output for Latin1 and UTF-8, for comparison:
   1.300 +.CS
   1.301 +	$ echo ä | iconv -t latin1 | od -c        
   1.302 +	0000000 344  \\n
   1.303 +	0000002
   1.304 +.sp .3
   1.305 +	$ echo ä | iconv -t utf8 | od -c  
   1.306 +	0000000 303 244  \\n
   1.307 +	0000003
   1.308 +.CE
   1.309 +.LP
   1.310 +The output (without the iconv conversion) on many European
   1.311 +systems equals one of these two.
   1.312 +.PP
   1.313 +Now the test of the cut implementation. On a UTF-8 system, a
   1.314 +POSIX compliant implementation behaves as such:
   1.315 +.CS
   1.316 +	$ echo ä | cut -c 1 | od -c
   1.317 +	0000000 303 244  \\n
   1.318 +	0000003
   1.319 +.sp .3
   1.320 +	$ echo ä | cut -b 1 | od -c
   1.321 +	0000000 303  \\n
   1.322 +	0000002
   1.323 +.sp .3
   1.324 +	$ echo ä | cut -b 1 -n | od -c
   1.325 +	0000000  \\n
   1.326 +	0000001
   1.327 +.CE
   1.328 +.LP
   1.329 +A pseudo POSIX implementation, in contrast, behaves like the
   1.330 +middle one, for all three invocations: Only the first byte is
   1.331 +output.
   1.332 +
   1.333 +.SH
   1.334 +Implementations
   1.335 +.LP
   1.336 +Let's take a look at the sources of a selection of
   1.337 +implementations.
   1.338 +.PP
   1.339 +A comparison of the amount of source code is good to get a first
   1.340 +impression.  Typically, it grows through time. This can be seen
   1.341 +here, in general but not in all cases. A POSIX-compliant
   1.342 +implementation of the character mode requires more code, thus
   1.343 +these implementations are rather the larger ones.
   1.344 +.TS
   1.345 +center;
   1.346 +r r r l l l.
   1.347 +SLOC	Lines	Bytes	Belongs to  	File tyime	Category
   1.348 +_
   1.349 +116	123	 2966	System III	1980-04-11	historic
   1.350 +118	125	 3038	4.3BSD-UWisc	1986-11-07	historic
   1.351 +200	256	 5715	4.3BSD-Reno	1990-06-25	historic
   1.352 +200	270	 6545	NetBSD	1993-03-21	historic
   1.353 +218	290	 6892	OpenBSD	2008-06-27	pseudo-POSIX
   1.354 +224	296	 6920	FreeBSD	1994-05-27	historic
   1.355 +232	306	 7500	NetBSD 	2014-02-03	pseudo-POSIX
   1.356 +340	405	 7423	Heirloom	2012-05-20	POSIX
   1.357 +382	586	14175	GNU coreutils	1992-11-08	pseudo-POSIX
   1.358 +391	479	10961	FreeBSD	2012-11-24	POSIX
   1.359 +588	830	23167	GNU coreutils	2015-05-01	pseudo-POSIX
   1.360 +.TE
   1.361 +.LP
   1.362 +Roughly four groups can be seen: (1) The two original
   1.363 +implementaions, which are mostly identical, with about 100
   1.364 +SLOC. (2) The five BSD versions, with about 200 SLOC. (3) The
   1.365 +two POSIX-compliant versions and the old GNU one, with a SLOC
   1.366 +count in the 300s. And finally (4) the modern GNU cut with
   1.367 +almost 600 SLOC.
   1.368 +.PP
   1.369 +The variation between the number of logical code
   1.370 +lines (SLOC, meassured with SLOCcount) and the number of
   1.371 +Newlines in the file (\f(CWwc -l\fP) spans between factor
   1.372 +1.06 for the oldest versions and factor 1.5 for GNU. The
   1.373 +largest influence on it are empty lines, pure comment lines
   1.374 +and the size of the license block at the beginning of the file.
   1.375 +.PP
   1.376 +Regarding the variation between logical code lines and the
   1.377 +file size (\f(CWwc -c\fP), the implementations span between
   1.378 +25 and 30 bytes per statement. With only 21 bytes per
   1.379 +statement, the Heirloom implementation marks the lower end;
   1.380 +the GNU implementation sets the upper limit at nearly 40. In
   1.381 +the case of GNU, the reason is mainly their coding style, with
   1.382 +special indent rules and long identifiers. Whether one finds
   1.383 +the Heirloom implementation
   1.384 +.[[ http://heirloom.cvs.sourceforge.net/viewvc/heirloom/heirloom/cut/cut.c?revision=1.6&view=markup
   1.385 +highly cryptic or exceptionally elegant, shall be left
   1.386 +open to the judgement of the reader. Especially the
   1.387 +comparison to the GNU implementation
   1.388 +.[[ http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/cut.c;hb=e981643
   1.389 +is impressive.
   1.390 +.PP
   1.391 +The internal structure of the source code (in all cases it is
   1.392 +written in C) is mainly similar. Besides the mandatory main
   1.393 +function, which does the command line argument processing,
   1.394 +there usually exists a function to convert the field
   1.395 +selection specification to an internal data structure.
   1.396 +Further more, almost all implementations have separate
   1.397 +functions for each of their operation modes. The POSIX-compliant
   1.398 +versions treat the \f(CW-b -n\fP combination as a separate
   1.399 +mode and thus implement it in an own function. Only the early
   1.400 +System III implementation (and its 4.3BSD-UWisc variant) do
   1.401 +everything, apart from error handling, in the main function.
   1.402 +.PP
   1.403 +Implementations of cut typically have two limiting aspects:
   1.404 +One being the maximum number of fields that can be handled,
   1.405 +the other being the maximum line length. On System III, both
   1.406 +numbers are limited to 512. 4.3BSD-Reno and the BSDs of the
   1.407 +90s have fixed limits as well (\f(CW_BSD_LINE_MAX\fP or
   1.408 +\f(CW_POSIX2_LINE_MAX\fP). Modern FreeBSD, NetBSD, all GNU
   1.409 +implementations and the Heirloom cut is able to handle
   1.410 +arbitrary numbers of fields and line lengths \(en the memory
   1.411 +is allocated dynamically. OpenBSD cut is a hybrid: It has a fixed
   1.412 +maximum number of fields, but allows arbitrary line lengths.
   1.413 +The limited number of fields does, however, not appear to be
   1.414 +any practical problem, because \f(CW_POSIX2_LINE_MAX\fP is
   1.415 +guaranteed to be at least 2048 and is thus probably large enough.
   1.416 +
   1.417 +.SH
   1.418 +Descriptions
   1.419 +.LP
   1.420 +Interesting, as well, is a comparison of the short descriptions
   1.421 +of cut, as can be found in the headlines of the man
   1.422 +pages or at the beginning of the source code files.
   1.423 +The following list is roughly sorted by time and grouped by
   1.424 +decent:
   1.425 +.TS
   1.426 +center;
   1.427 +l l.
   1.428 +CB UNIX	cut out selected fields of each line of a file
   1.429 +System III	cut out selected fields of each line of a file
   1.430 +System III \(dg	cut and paste columns of a table (projection of a relation)
   1.431 +System V	cut out selected fields of each line of a file
   1.432 +HP-UX	cut out (extract) selected fields of each line of a file
   1.433 +.sp .3
   1.434 +4.3BSD-UWisc \(dg	cut and paste columns of a table (projection of a relation)
   1.435 +4.3BSD-Reno	select portions of each line of a file
   1.436 +NetBSD	select portions of each line of a file
   1.437 +OpenBSD 4.6	select portions of each line of a file
   1.438 +FreeBSD 1.0	select portions of each line of a file
   1.439 +FreeBSD 10.0	cut out selected portions of each line of a file
   1.440 +SunOS 4.1.3	remove selected fields from each line of a file
   1.441 +SunOS 5.5.1	cut out selected fields of each line of a file
   1.442 +.sp .3
   1.443 +Heirloom Tools	cut out selected fields of each line of a file
   1.444 +Heirloom Tools \(dg	cut out fields of lines of files
   1.445 +.sp .3
   1.446 +GNU coreutils	remove sections from each line of files
   1.447 +.sp .3
   1.448 +Minix	select out columns of a file
   1.449 +.sp .3
   1.450 +Version 8 Unix	rearrange columns of data
   1.451 +``Unix Reader''	rearrange columns of text
   1.452 +.sp .3
   1.453 +POSIX	cut out selected fields of each line of a file
   1.454 +.TE
   1.455 +.LP
   1.456 +(The descriptions that are marked with `\(dg' were taken from
   1.457 +source code files. The POSIX entry contains the description
   1.458 +used in the standard. The ``Unix Reader'' is a retrospective
   1.459 +document by Doug McIlroy, which lists the availability of
   1.460 +tools in the Research Unix versions
   1.461 +.[[ http://doc.cat-v.org/unix/unix-reader/contents.pdf .
   1.462 +Its description should actually match the one in Version 8
   1.463 +Unix. The change could be a transfer mistake or a correction.
   1.464 +All other descriptions originate from the various man pages.)
   1.465 +.PP
   1.466 +Over time, the POSIX description was often adopted or it
   1.467 +served as inspiration. One such example is FreeBSD
   1.468 +.[[ https://svnweb.freebsd.org/base?view=revision&revision=167101 .
   1.469 +.PP
   1.470 +It is noteworthy that the GNU coreutils in all versions
   1.471 +describe the performed action as a removal of parts of the
   1.472 +input, although the user clearly selects the parts that are
   1.473 +output. Probably the words ``cut out'' are too misleading.
   1.474 +HP-UX concretized them.
   1.475 +.PP
   1.476 +There are also different terms used for the thing being
   1.477 +selected. Some talk about fields (POSIX), some talk
   1.478 +about portions (BSD) and some call it columns (Research
   1.479 +Unix).
   1.480 +.PP
   1.481 +The seemingly least adequate description, the one of Version
   1.482 +8 Unix (``rearrange columns of data'') is explainable in so
   1.483 +far that the man page covers both, cut and paste, and in
   1.484 +their combination, columns can be rearranged. The use of
   1.485 +``data'' instead of ``text'' might be a lapse, which McIlroy
   1.486 +corrected in his Unix Reader ... but, on the other hand, on
   1.487 +Unix, the two words are mostly synonymous, because all data
   1.488 +is text.
   1.489 +
   1.490 +
   1.491 +.SH
   1.492 +Referenzen
   1.493 +.LP
   1.494 +.nf
   1.495 +._r
   1.496 +
author	markus schnalke <meillo@marmaro.de>
date	Tue, 04 Aug 2015 21:04:10 +0200
parents
children	0d7329867dd1