docs/cut
diff cut.en.ms @ 27:5cefcfc72d42
Added first version of the translation to English
author | markus schnalke <meillo@marmaro.de> |
---|---|
date | Tue, 04 Aug 2015 21:04:10 +0200 |
parents | |
children | 0d7329867dd1 |
line diff
1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/cut.en.ms Tue Aug 04 21:04:10 2015 +0200 1.3 @@ -0,0 +1,493 @@ 1.4 +.so macros 1.5 +.lc_ctype en_US.utf8 1.6 +.pl -4v 1.7 + 1.8 +.TL 1.9 +Cut out selected fields of each line of a file 1.10 +.AU 1.11 +markus schnalke <meillo@marmaro.de> 1.12 +.. 1.13 +.FS 1.14 +2015-05. 1.15 +This text is in the public domain (CC0). 1.16 +It is available online: 1.17 +.I http://marmaro.de/docs/ 1.18 +.FE 1.19 + 1.20 +.LP 1.21 +Cut is a classic program in the Unix toolchest. 1.22 +It is present in most tutorials on shell programming, because it 1.23 +is such a nice and useful tool which good explanationary value. 1.24 +This text shall take a look behind its surface. 1.25 +.SH 1.26 +Usage 1.27 +.LP 1.28 +Initially, cut had two operation modes, which were amended by a 1.29 +third one, later. Cut may cut specified characters out of the 1.30 +input lines or it may cut out specified fields, which are defined 1.31 +by a delimiting character. 1.32 +.PP 1.33 +The character mode is well suited to slice fixed-width input 1.34 +formats into parts. One might, for instance, extract the access 1.35 +rights from the output of \f(CWls -l\fP, here the rights of the 1.36 +file's owner: 1.37 +.CS 1.38 + $ ls -l foo 1.39 + -rw-rw-r-- 1 meillo users 0 May 12 07:32 foo 1.40 +.sp .3 1.41 + $ ls -l foo | cut -c 2-4 1.42 + rw- 1.43 +.CE 1.44 +.LP 1.45 +Or the write permission for the owner, the group and the 1.46 +world: 1.47 +.CS 1.48 + $ ls -l foo | cut -c 3,6,9 1.49 + ww- 1.50 +.CE 1.51 +.LP 1.52 +Cut can also be used to shorten strings: 1.53 +.CS 1.54 + $ long=12345678901234567890 1.55 +.sp .3 1.56 + $ echo "$long" | cut -c -10 1.57 + 1234567890 1.58 +.CE 1.59 +.LP 1.60 +This command outputs no more than the first 10 characters of 1.61 +\f(CW$long\fP. (Alternatively, on could use \f(CWprintf 1.62 +"%.10s\\n" "$long"\fP for this job.) 1.63 +.PP 1.64 +However, if it's not about displaying characters but about their 1.65 +storing, then \f(CW-c\fP is only partly suited. In former times, 1.66 +when US-ASCII had been the omnipresent character encoding, each 1.67 +character was stored with exactly one byte. Therefore, \f(CWcut 1.68 +-c\fP selected both, output characters and bytes, equally. With 1.69 +the uprise of multi-byte encodings (like UTF-8), this assumption 1.70 +became obsolete. Consequently, a byte mode (option \f(CW-b\fP) 1.71 +was added to cut, with POSIX.2-1992. To select the first up to 1.72 +500 bytes of each line (and ignore the rest), one can use: 1.73 +.CS 1.74 + $ cut -b -500 1.75 +.CE 1.76 +.LP 1.77 +The remainder can be caught with \f(CWcut -b 501-\fP. This 1.78 +possibility is important for POSIX, because it allows to create 1.79 +text files with limited line length 1.80 +.[[ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cut.html#tag_20_28_17 . 1.81 +.PP 1.82 +Although the byte mode was newly introduced, it was meant to 1.83 +behave exactly as the old character mode. The character mode, 1.84 +however, had to be implemented differently. In consequence, 1.85 +the problem wasn't to support the byte mode, but to support the 1.86 +new character mode correctly. 1.87 +.PP 1.88 +Besides the character and byte modes, cut has the field mode, 1.89 +which is activated by \f(CW-f\fP. It selects fields from the 1.90 +input. The delimiting character (by default, the tab) may be 1.91 +changed using \f(CW-d\fP. It applies to the input as well as to 1.92 +the output. 1.93 +.PP 1.94 +The typical example for the use of cut's field mode is the 1.95 +selection of information from the passwd file. Here, for 1.96 +instance, the username and its uid: 1.97 +.CS 1.98 + $ cut -d: -f1,3 /etc/passwd 1.99 + root:0 1.100 + bin:1 1.101 + daemon:2 1.102 + mail:8 1.103 + ... 1.104 +.CE 1.105 +.LP 1.106 +(The values to the command line switches may be appended directly 1.107 +to them or separated by whitespace.) 1.108 +.PP 1.109 +The field mode is suited for simple tabulary data, like the 1.110 +passwd file. Beyond that, it soon reaches its limits. Especially, 1.111 +the typical case of whitespace-separated fields is covered poorly 1.112 +by it. Cut's delimiter is exactly one character, 1.113 +therefore one may not split at both, space and tab characters. 1.114 +Furthermore, multiple adjacent delimiter characters lead to 1.115 +empty fields. This is not the expected behavior for 1.116 +the processing of whitespace-separated fields. Some 1.117 +implementations, e.g. the one of FreeBSD, have extensions that 1.118 +handle this case in the expected way. Apart from that, i.e. 1.119 +if one likes to stay portable, awk comes to rescue. 1.120 +.PP 1.121 +Awk provides another function that cut misses: Changing the order 1.122 +of the fields in the output. For cut, the order of the field 1.123 +selection specification is irrelevant; it doesn't even matter if 1.124 +fields are given multiple times. Thus, the invocation 1.125 +\f(CWcut -c 5-8,1,4-6\fP outputs the characters number 1.126 +1, 4, 5, 6, 7 and 8 in exactly this order. The 1.127 +selection is like in the mathematical set theory: Each 1.128 +specified field is part of the solution set. The fields in the 1.129 +solution set are always in the same order as in the input. To 1.130 +speak with the words of the man page in Version 8 Unix: 1.131 +``In data base parlance, it projects a relation.'' 1.132 +.[[ http://man.cat-v.org/unix_8th/1/cut 1.133 +This means, cut applies the database operation \fIprojection\fP 1.134 +to the text input. Wikipedia explains it in the following way: 1.135 +``In practical terms, it can be roughly thought of as picking a 1.136 +sub-set of all available columns.'' 1.137 +.[[ https://en.wikipedia.org/wiki/Projection_(relational_algebra) 1.138 + 1.139 +.SH 1.140 +Historical Background 1.141 +.LP 1.142 +Cut came to public life in 1982 with the release of UNIX System 1.143 +III. Browsing through the sources of System III, one finds cut.c 1.144 +with the timestamp 1980-04-11 1.145 +.[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=SysIII/usr/src/cmd . 1.146 +This is the oldest implementation of the program, I was able to 1.147 +discover. However, the SCCS-ID in the source code speaks of 1.148 +version 1.5. According to Doug McIlroy 1.149 +.[[ http://minnie.tuhs.org/pipermail/tuhs/2015-May/004083.html , 1.150 +the earlier history likely lays in PWB/UNIX, which was the 1.151 +basis for System III. In the available sources of PWB 1.0 (1977) 1.152 +.[[ http://minnie.tuhs.org/Archive/PDP-11/Distributions/usdl/ , 1.153 +no cut is present. Of PWB 2.0, no sources or useful documentation 1.154 +seem to be available. PWB 3.0 was later renamed to System III 1.155 +for marketing purposes, hence it is identical to it. A side line 1.156 +of PWB was CB UNIX, which was only used in the Bell Labs 1.157 +internally. The manual of CB UNIX Edition 2.1 of November 1979 1.158 +contains the earliest mentioning of cut, that my research brought 1.159 +to light: A man page for it 1.160 +.[[ ftp://sunsite.icm.edu.pl/pub/unix/UnixArchive/PDP-11/Distributions/other/CB_Unix/cbunix_man1_02.pdf . 1.161 +.PP 1.162 +Now a look on BSD: There, my earliest discovery is a cut.c with 1.163 +the file modification date of 1986-11-07 1.164 +.[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-UWisc/src/usr.bin/cut 1.165 +as part of the special version 4.3BSD-UWisc 1.166 +.[[ http://gunkies.org/wiki/4.3_BSD_NFS_Wisconsin_Unix , 1.167 +which was released in January 1987. 1.168 +This implementation is mostly identical to the one in System 1.169 +III. The better known 4.3BSD-Tahoe (1988) does not contain cut. 1.170 +The following 4.3BSD-Reno (1990) does include cut. It is a freshly 1.171 +written one by Adam S. Moskowitz and Marciano Pitargue, which was 1.172 +included in BSD in 1989 1.173 +.[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-Reno/src/usr.bin/cut . 1.174 +Its man page 1.175 +.[[ http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-Reno/src/usr.bin/cut/cut.1 1.176 +already mentions the expected compliance to POSIX.2. 1.177 +One should note that POSIX.2 was first published in 1.178 +September 1992, about two years after the man page and the 1.179 +program were written. Hence, the program must have been 1.180 +implemented based on a draft version of the standard. A look into 1.181 +the code confirms the assumption. The function to parse the field 1.182 +selection includes the following comment: 1.183 +.QP 1.184 +This parser is less restrictive than the Draft 9 POSIX spec. 1.185 +POSIX doesn't allow lists that aren't in increasing order or 1.186 +overlapping lists. 1.187 +.LP 1.188 +Draft 11.2 of POSIX (1991-09) requires this flexibility already: 1.189 +.QP 1.190 +The elements in list can be repeated, can overlap, and can 1.191 +be specified in any order. 1.192 +.LP 1.193 +The same draft additionally includes all three operation modes, 1.194 +whereas this early BSD cut only implemented the original two. 1.195 +Draft 9 might not have included the byte mode. Without access to 1.196 +Draft 9 or 10, it wasn't possible to verify this guess. 1.197 +.PP 1.198 +The version numbers and change dates of the older BSD 1.199 +implementations are manifested in the SCCS-IDs, which the 1.200 +version control system of that time inserted. For instance 1.201 +in 4.3BSD-Reno: ``5.3 (Berkeley) 6/24/90''. 1.202 +.PP 1.203 +The cut implementation of the GNU coreutils contains the 1.204 +following copyright notice: 1.205 +.CS 1.206 + Copyright (C) 1997-2015 Free Software Foundation, Inc. 1.207 + Copyright (C) 1984 David M. Ihnat 1.208 +.CE 1.209 +.LP 1.210 +The code does have pretty old origins. Further comments show that 1.211 +the source code was reworked by David MacKenzie first and later 1.212 +by Jim Meyering, who put it into the version control system in 1.213 +1992. It is unclear, why the years until 1997, at least from 1.214 +1992 on, don't show up in the copyright notice. 1.215 +.PP 1.216 +Despite all those year numbers from the 80s, cut is a rather 1.217 +young tool, at least in relation to the early Unix. Despite 1.218 +being a decade older than Linux, the kernel, Unix had been 1.219 +present for over ten years until cut appeared for the first 1.220 +time. Most notably, cut wasn't part of Version 7 Unix, which 1.221 +became the basis for all modern Unix systems. The more complex 1.222 +tools sed and awk had been part of it already. Hence, the 1.223 +question comes to mind, why cut was written at all, as there 1.224 +existed two programs that were able to cover the use cases of 1.225 +cut. On reason for cut surely was its compactness and the 1.226 +resulting speed, in comparison to the then bulky awk. This lean 1.227 +shape goes well with the Unix philosopy: Do one job and do it 1.228 +well! Cut convinced. It found it's way to other Unix variants, 1.229 +it became standardized and today it is present everywhere. 1.230 +.PP 1.231 +The original variant (without \f(CW-b\fP) was described by the 1.232 +System V Interface Defintion, an important formal description 1.233 +of UNIX System V, already in 1985. In the following years, it 1.234 +appeared in all relevant standards. POSIX.2 in 1992 specified 1.235 +cut for the first time in its modern form (with \f(CW-b\fP). 1.236 + 1.237 +.SH 1.238 +Multi-byte support 1.239 +.LP 1.240 +The byte mode and thus the multi-byte support of 1.241 +the POSIX character mode are standardized since 1992. But 1.242 +how about their presence in the available implementations? 1.243 +Which versions do implement POSIX correctly? 1.244 +.PP 1.245 +The situation is divided in three parts: There are historic 1.246 +implementations, which have only \f(CW-c\fP and \f(CW-f\fP. 1.247 +Then there are implementations, which have \f(CW-b\fP but 1.248 +treat it as an alias for \f(CW-c\fP only. These 1.249 +implementations work correctly for single-byte encodings 1.250 +(e.g. US-ASCII, Latin1) but for multi-byte encodings (e.g. 1.251 +UTF-8) their \f(CW-c\fP behaves like \f(CW-b\fP (and 1.252 +\f(CW-n\fP is ignored). Finally, there are implementations 1.253 +that implement \f(CW-b\fP and \f(CW-c\fP POSIX-compliant. 1.254 +.PP 1.255 +Historic two-mode implementations are the ones of 1.256 +System III, System V and the BSD ones until the mid-90s. 1.257 +.PP 1.258 +Pseudo multi-byte implementations are provided by GNU and 1.259 +modern NetBSD and OpenBSD. The level of POSIX compliance 1.260 +that is presented there is often higher than the level of 1.261 +compliance that is actually provided. Sometimes it takes a 1.262 +close look to discover that \f(CW-c\fP and \f(CW-n\fP don't 1.263 +behave as expected. Some of the implementations take the 1.264 +easy way by simply being ignorant to any multi-byte 1.265 +encodings, at least they tell that clearly: 1.266 +.QP 1.267 +Since we don't support multi-byte characters, the \f(CW-c\fP and \f(CW-b\fP 1.268 +options are equivalent, and the \f(CW-n\fP option is meaningless. 1.269 +.[[ http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/cut/cut.c?rev=1.18&content-type=text/x-cvsweb-markup 1.270 +.LP 1.271 +Standard-adhering implementations, ones that treat 1.272 +multi-byte characters correctly, are the one of the modern 1.273 +FreeBSD and the one in the Heirloom toolchest. Tim Robbins 1.274 +reimplemented the character mode of FreeBSD cut, 1.275 +conforming to POSIX, in summer 2004 1.276 +.[[ https://svnweb.freebsd.org/base?view=revision&revision=131194 . 1.277 +The question, why the other BSD systems have not 1.278 +integrated this change, is an open one. Maybe the answer an be 1.279 +found in the above quoted statement. 1.280 +.PP 1.281 +How does a user find out if the cut on the own system handles 1.282 +multi-byte characters correclty? First, one needs to check if 1.283 +the system itself uses multi-byte characters, because otherwise 1.284 +characters and bytes are equivalent and the question 1.285 +is irrelevant. One can check this by looking at the locale 1.286 +settings, but it is easier to print a typical multi-byte 1.287 +character, for instance an Umlaut or the Euro currency 1.288 +symbol, and check if one or more bytes are output: 1.289 +.CS 1.290 + $ echo ä | od -c 1.291 + 0000000 303 244 \\n 1.292 + 0000003 1.293 +.CE 1.294 +.LP 1.295 +In this case it were two bytes: octal 303 and 244. (The 1.296 +Newline character is added by echo.) 1.297 +.PP 1.298 +The program iconv converts text to specific encodings. This 1.299 +is the output for Latin1 and UTF-8, for comparison: 1.300 +.CS 1.301 + $ echo ä | iconv -t latin1 | od -c 1.302 + 0000000 344 \\n 1.303 + 0000002 1.304 +.sp .3 1.305 + $ echo ä | iconv -t utf8 | od -c 1.306 + 0000000 303 244 \\n 1.307 + 0000003 1.308 +.CE 1.309 +.LP 1.310 +The output (without the iconv conversion) on many European 1.311 +systems equals one of these two. 1.312 +.PP 1.313 +Now the test of the cut implementation. On a UTF-8 system, a 1.314 +POSIX compliant implementation behaves as such: 1.315 +.CS 1.316 + $ echo ä | cut -c 1 | od -c 1.317 + 0000000 303 244 \\n 1.318 + 0000003 1.319 +.sp .3 1.320 + $ echo ä | cut -b 1 | od -c 1.321 + 0000000 303 \\n 1.322 + 0000002 1.323 +.sp .3 1.324 + $ echo ä | cut -b 1 -n | od -c 1.325 + 0000000 \\n 1.326 + 0000001 1.327 +.CE 1.328 +.LP 1.329 +A pseudo POSIX implementation, in contrast, behaves like the 1.330 +middle one, for all three invocations: Only the first byte is 1.331 +output. 1.332 + 1.333 +.SH 1.334 +Implementations 1.335 +.LP 1.336 +Let's take a look at the sources of a selection of 1.337 +implementations. 1.338 +.PP 1.339 +A comparison of the amount of source code is good to get a first 1.340 +impression. Typically, it grows through time. This can be seen 1.341 +here, in general but not in all cases. A POSIX-compliant 1.342 +implementation of the character mode requires more code, thus 1.343 +these implementations are rather the larger ones. 1.344 +.TS 1.345 +center; 1.346 +r r r l l l. 1.347 +SLOC Lines Bytes Belongs to File tyime Category 1.348 +_ 1.349 +116 123 2966 System III 1980-04-11 historic 1.350 +118 125 3038 4.3BSD-UWisc 1986-11-07 historic 1.351 +200 256 5715 4.3BSD-Reno 1990-06-25 historic 1.352 +200 270 6545 NetBSD 1993-03-21 historic 1.353 +218 290 6892 OpenBSD 2008-06-27 pseudo-POSIX 1.354 +224 296 6920 FreeBSD 1994-05-27 historic 1.355 +232 306 7500 NetBSD 2014-02-03 pseudo-POSIX 1.356 +340 405 7423 Heirloom 2012-05-20 POSIX 1.357 +382 586 14175 GNU coreutils 1992-11-08 pseudo-POSIX 1.358 +391 479 10961 FreeBSD 2012-11-24 POSIX 1.359 +588 830 23167 GNU coreutils 2015-05-01 pseudo-POSIX 1.360 +.TE 1.361 +.LP 1.362 +Roughly four groups can be seen: (1) The two original 1.363 +implementaions, which are mostly identical, with about 100 1.364 +SLOC. (2) The five BSD versions, with about 200 SLOC. (3) The 1.365 +two POSIX-compliant versions and the old GNU one, with a SLOC 1.366 +count in the 300s. And finally (4) the modern GNU cut with 1.367 +almost 600 SLOC. 1.368 +.PP 1.369 +The variation between the number of logical code 1.370 +lines (SLOC, meassured with SLOCcount) and the number of 1.371 +Newlines in the file (\f(CWwc -l\fP) spans between factor 1.372 +1.06 for the oldest versions and factor 1.5 for GNU. The 1.373 +largest influence on it are empty lines, pure comment lines 1.374 +and the size of the license block at the beginning of the file. 1.375 +.PP 1.376 +Regarding the variation between logical code lines and the 1.377 +file size (\f(CWwc -c\fP), the implementations span between 1.378 +25 and 30 bytes per statement. With only 21 bytes per 1.379 +statement, the Heirloom implementation marks the lower end; 1.380 +the GNU implementation sets the upper limit at nearly 40. In 1.381 +the case of GNU, the reason is mainly their coding style, with 1.382 +special indent rules and long identifiers. Whether one finds 1.383 +the Heirloom implementation 1.384 +.[[ http://heirloom.cvs.sourceforge.net/viewvc/heirloom/heirloom/cut/cut.c?revision=1.6&view=markup 1.385 +highly cryptic or exceptionally elegant, shall be left 1.386 +open to the judgement of the reader. Especially the 1.387 +comparison to the GNU implementation 1.388 +.[[ http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/cut.c;hb=e981643 1.389 +is impressive. 1.390 +.PP 1.391 +The internal structure of the source code (in all cases it is 1.392 +written in C) is mainly similar. Besides the mandatory main 1.393 +function, which does the command line argument processing, 1.394 +there usually exists a function to convert the field 1.395 +selection specification to an internal data structure. 1.396 +Further more, almost all implementations have separate 1.397 +functions for each of their operation modes. The POSIX-compliant 1.398 +versions treat the \f(CW-b -n\fP combination as a separate 1.399 +mode and thus implement it in an own function. Only the early 1.400 +System III implementation (and its 4.3BSD-UWisc variant) do 1.401 +everything, apart from error handling, in the main function. 1.402 +.PP 1.403 +Implementations of cut typically have two limiting aspects: 1.404 +One being the maximum number of fields that can be handled, 1.405 +the other being the maximum line length. On System III, both 1.406 +numbers are limited to 512. 4.3BSD-Reno and the BSDs of the 1.407 +90s have fixed limits as well (\f(CW_BSD_LINE_MAX\fP or 1.408 +\f(CW_POSIX2_LINE_MAX\fP). Modern FreeBSD, NetBSD, all GNU 1.409 +implementations and the Heirloom cut is able to handle 1.410 +arbitrary numbers of fields and line lengths \(en the memory 1.411 +is allocated dynamically. OpenBSD cut is a hybrid: It has a fixed 1.412 +maximum number of fields, but allows arbitrary line lengths. 1.413 +The limited number of fields does, however, not appear to be 1.414 +any practical problem, because \f(CW_POSIX2_LINE_MAX\fP is 1.415 +guaranteed to be at least 2048 and is thus probably large enough. 1.416 + 1.417 +.SH 1.418 +Descriptions 1.419 +.LP 1.420 +Interesting, as well, is a comparison of the short descriptions 1.421 +of cut, as can be found in the headlines of the man 1.422 +pages or at the beginning of the source code files. 1.423 +The following list is roughly sorted by time and grouped by 1.424 +decent: 1.425 +.TS 1.426 +center; 1.427 +l l. 1.428 +CB UNIX cut out selected fields of each line of a file 1.429 +System III cut out selected fields of each line of a file 1.430 +System III \(dg cut and paste columns of a table (projection of a relation) 1.431 +System V cut out selected fields of each line of a file 1.432 +HP-UX cut out (extract) selected fields of each line of a file 1.433 +.sp .3 1.434 +4.3BSD-UWisc \(dg cut and paste columns of a table (projection of a relation) 1.435 +4.3BSD-Reno select portions of each line of a file 1.436 +NetBSD select portions of each line of a file 1.437 +OpenBSD 4.6 select portions of each line of a file 1.438 +FreeBSD 1.0 select portions of each line of a file 1.439 +FreeBSD 10.0 cut out selected portions of each line of a file 1.440 +SunOS 4.1.3 remove selected fields from each line of a file 1.441 +SunOS 5.5.1 cut out selected fields of each line of a file 1.442 +.sp .3 1.443 +Heirloom Tools cut out selected fields of each line of a file 1.444 +Heirloom Tools \(dg cut out fields of lines of files 1.445 +.sp .3 1.446 +GNU coreutils remove sections from each line of files 1.447 +.sp .3 1.448 +Minix select out columns of a file 1.449 +.sp .3 1.450 +Version 8 Unix rearrange columns of data 1.451 +``Unix Reader'' rearrange columns of text 1.452 +.sp .3 1.453 +POSIX cut out selected fields of each line of a file 1.454 +.TE 1.455 +.LP 1.456 +(The descriptions that are marked with `\(dg' were taken from 1.457 +source code files. The POSIX entry contains the description 1.458 +used in the standard. The ``Unix Reader'' is a retrospective 1.459 +document by Doug McIlroy, which lists the availability of 1.460 +tools in the Research Unix versions 1.461 +.[[ http://doc.cat-v.org/unix/unix-reader/contents.pdf . 1.462 +Its description should actually match the one in Version 8 1.463 +Unix. The change could be a transfer mistake or a correction. 1.464 +All other descriptions originate from the various man pages.) 1.465 +.PP 1.466 +Over time, the POSIX description was often adopted or it 1.467 +served as inspiration. One such example is FreeBSD 1.468 +.[[ https://svnweb.freebsd.org/base?view=revision&revision=167101 . 1.469 +.PP 1.470 +It is noteworthy that the GNU coreutils in all versions 1.471 +describe the performed action as a removal of parts of the 1.472 +input, although the user clearly selects the parts that are 1.473 +output. Probably the words ``cut out'' are too misleading. 1.474 +HP-UX concretized them. 1.475 +.PP 1.476 +There are also different terms used for the thing being 1.477 +selected. Some talk about fields (POSIX), some talk 1.478 +about portions (BSD) and some call it columns (Research 1.479 +Unix). 1.480 +.PP 1.481 +The seemingly least adequate description, the one of Version 1.482 +8 Unix (``rearrange columns of data'') is explainable in so 1.483 +far that the man page covers both, cut and paste, and in 1.484 +their combination, columns can be rearranged. The use of 1.485 +``data'' instead of ``text'' might be a lapse, which McIlroy 1.486 +corrected in his Unix Reader ... but, on the other hand, on 1.487 +Unix, the two words are mostly synonymous, because all data 1.488 +is text. 1.489 + 1.490 + 1.491 +.SH 1.492 +Referenzen 1.493 +.LP 1.494 +.nf 1.495 +._r 1.496 +