Mercurial > docs > cut
comparison cut.en.ms @ 31:106609b64dc4
minor corrections and improvements in the text
author | markus schnalke <meillo@marmaro.de> |
---|---|
date | Tue, 15 Sep 2015 17:20:20 +0200 |
parents | 6977e2ee5dc5 |
children | 5f78bcd34eeb |
comparison
equal
deleted
inserted
replaced
30:6977e2ee5dc5 | 31:106609b64dc4 |
---|---|
88 the input. The field-delimiter character for the input as well | 88 the input. The field-delimiter character for the input as well |
89 as for the output (by default the tab) may be changed using | 89 as for the output (by default the tab) may be changed using |
90 \f(CW-d\fP. | 90 \f(CW-d\fP. |
91 .PP | 91 .PP |
92 The typical example for the use of cut's field mode is the | 92 The typical example for the use of cut's field mode is the |
93 selection of information from the passwd file. Here, for | 93 selection of information from the password file. Here, for |
94 instance, the usernames and their uids: | 94 instance, the usernames and their uids: |
95 .CS | 95 .CS |
96 $ cut -d: -f1,3 /etc/passwd | 96 $ cut -d: -f1,3 /etc/passwd |
97 root:0 | 97 root:0 |
98 bin:1 | 98 bin:1 |
103 .LP | 103 .LP |
104 (The values to the command line switches may be appended directly | 104 (The values to the command line switches may be appended directly |
105 to them or separated by whitespace.) | 105 to them or separated by whitespace.) |
106 .PP | 106 .PP |
107 The field mode is suited for simple tabulary data, like the | 107 The field mode is suited for simple tabulary data, like the |
108 passwd file. Beyond that, it soon reaches its limits. The typical | 108 password file. Beyond that, it soon reaches its limits. The typical |
109 case of whitespace-separated fields, in particular, is covered | 109 case of whitespace-separated fields, in particular, is covered |
110 poorly by it. Cut's delimiter is exactly one character, | 110 poorly by it. Cut's delimiter is exactly one character, |
111 therefore one can not split at both space and tab characters. | 111 therefore one can not split at both space and tab characters. |
112 Furthermore, multiple adjacent delimiter characters lead to | 112 Furthermore, multiple adjacent delimiter characters lead to |
113 empty fields. This is not the expected behavior for | 113 empty fields. This is not the expected behavior for |
203 .CS | 203 .CS |
204 Copyright (C) 1997-2015 Free Software Foundation, Inc. | 204 Copyright (C) 1997-2015 Free Software Foundation, Inc. |
205 Copyright (C) 1984 David M. Ihnat | 205 Copyright (C) 1984 David M. Ihnat |
206 .CE | 206 .CE |
207 .LP | 207 .LP |
208 The code does have old origins. Further comments show that | 208 This code does have old origins. Further comments show that |
209 the source code was reworked by David MacKenzie first and later | 209 the source code was reworked by David MacKenzie first and later |
210 by Jim Meyering, who put it into the version control system in | 210 by Jim Meyering, who put it into the version control system in |
211 1992. It is unclear why the years until 1997, at least from | 211 1992. It is unclear why the years until 1997, at least from |
212 1992 onwards, don't show up in the copyright notice. | 212 1992 onwards, don't show up in the copyright notice. |
213 .PP | 213 .PP |
214 Despite all those year numbers from the 80s, cut is a rather | 214 Despite all those year numbers from the 80s, cut is a rather |
215 young tool, at least in relation to the early Unix. Despite | 215 young tool, at least in relation to the early Unix. Despite |
216 being a decade older than Linux (the kernel), Unix was present | 216 being a decade older than Linux (the kernel), Unix was present |
217 for over ten years by the time cut appeared for the first | 217 for over ten years already by the time cut appeared for the first |
218 time. Most notably, cut wasn't part of Version 7 Unix, which | 218 time. Most notably, cut wasn't part of Version 7 Unix, which |
219 became the basis for all modern Unix systems. The more complex | 219 became the basis for all modern Unix systems. The more complex |
220 tools sed and awk were part of it already. Hence, the | 220 tools sed and awk were part of it already. Hence, the |
221 question comes to mind why cut was written at all, as two | 221 question comes to mind why cut was written at all, as two |
222 programs already existed that were able to cover the use cases of | 222 programs already existed that were able to cover its use |
223 cut. One reason for cut surely was its compactness and the | 223 cases. One reason for cut surely was its compactness and the |
224 resulting speed, in comparison to the then-bulky awk. This lean | 224 resulting speed, in comparison to the then-bulky awk. This lean |
225 shape goes well with the Unix philosopy: Do one job and do it | 225 shape goes well with the Unix philosopy: Do one job and do it |
226 well! Cut was sufficiently convincing. It found its way to | 226 well! Cut was sufficiently convincing. It found its way to |
227 other Unix variants, it became standardized, and today it is | 227 other Unix variants, it became standardized, and today it is |
228 present everywhere. | 228 present everywhere. |
251 \f(CW-n\fP is ignored). Finally, there are implementations | 251 \f(CW-n\fP is ignored). Finally, there are implementations |
252 that implement \f(CW-c\fP and \f(CW-b\fP in a POSIX-compliant | 252 that implement \f(CW-c\fP and \f(CW-b\fP in a POSIX-compliant |
253 way. | 253 way. |
254 .PP | 254 .PP |
255 Historic two-mode implementations are the ones of | 255 Historic two-mode implementations are the ones of |
256 System III, System V, and the BSD ones from the beginning | 256 System III, System V, and the BSD ones until the mid-90s. |
257 until the mid-90s. | |
258 .PP | 257 .PP |
259 Pseudo multi-byte implementations are provided by GNU, | 258 Pseudo multi-byte implementations are provided by GNU, |
260 modern NetBSD, and modern OpenBSD. The level of POSIX compliance | 259 modern NetBSD, and modern OpenBSD. The level of POSIX compliance |
261 that is presented there is often higher than the level of | 260 that is presented there is often higher than the level of |
262 compliance that is actually provided. Sometimes it takes a | 261 compliance that is actually provided. Sometimes it takes a |
268 Since we don't support multi-byte characters, the \f(CW-c\fP | 267 Since we don't support multi-byte characters, the \f(CW-c\fP |
269 and \f(CW-b\fP options are equivalent, and the \f(CW-n\fP | 268 and \f(CW-b\fP options are equivalent, and the \f(CW-n\fP |
270 option is meaningless. | 269 option is meaningless. |
271 .[[ http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/cut/cut.c?rev=1.18&content-type=text/x-cvsweb-markup | 270 .[[ http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/cut/cut.c?rev=1.18&content-type=text/x-cvsweb-markup |
272 .LP | 271 .LP |
273 Standard-adhering implementations, ones that treat | 272 Standard-adhering implementations, i.e. ones that treat |
274 multi-byte characters correctly, are the one of the modern | 273 multi-byte characters correctly, are those of the modern |
275 FreeBSD and the one in the Heirloom toolchest. Tim Robbins | 274 FreeBSD and the Heirloom toolchest. Tim Robbins |
276 reimplemented the character mode of FreeBSD cut, | 275 reimplemented the character mode of FreeBSD cut, |
277 conforming to POSIX, in the summer of 2004 | 276 conforming to POSIX, in the summer of 2004 |
278 .[[ https://svnweb.freebsd.org/base?view=revision&revision=131194 . | 277 .[[ https://svnweb.freebsd.org/base?view=revision&revision=131194 . |
279 The question why the other BSD systems have not | 278 The question why the other BSD systems have not |
280 integrated this change is an open one. Maybe the answer an be | 279 integrated this change is an open one. Maybe the answer an be |
281 found in the above quoted statement. | 280 found in the above quoted statement. |
282 .PP | 281 .PP |
283 How does a user find out if the cut on their own system handles | 282 How do users find out if the cut on their own system handles |
284 multi-byte characters correctly? First, one needs to check if | 283 multi-byte characters correctly? First, one needs to check if |
285 the system itself uses multi-byte characters, because otherwise | 284 the system itself uses multi-byte characters, because otherwise |
286 characters and bytes are equivalent and the question | 285 characters and bytes are equivalent and the question |
287 is irrelevant. One can check this by looking at the locale | 286 is irrelevant. One can check this by looking at the locale |
288 settings, but it is easier to print a typical multi-byte | 287 settings, but it is easier to print a typical multi-byte |
345 implementation of the character mode requires more code, thus | 344 implementation of the character mode requires more code, thus |
346 these implementations tend to be the larger ones. | 345 these implementations tend to be the larger ones. |
347 .TS | 346 .TS |
348 center; | 347 center; |
349 r r r l l l. | 348 r r r l l l. |
350 SLOC Lines Bytes Belongs to File tyime Category | 349 SLOC Lines Bytes Belongs to File time Category |
351 _ | 350 _ |
352 116 123 2966 System III 1980-04-11 historic | 351 116 123 2966 System III 1980-04-11 historic |
353 118 125 3038 4.3BSD-UWisc 1986-11-07 historic | 352 118 125 3038 4.3BSD-UWisc 1986-11-07 historic |
354 200 256 5715 4.3BSD-Reno 1990-06-25 historic | 353 200 256 5715 4.3BSD-Reno 1990-06-25 historic |
355 200 270 6545 NetBSD 1993-03-21 historic | 354 200 270 6545 NetBSD 1993-03-21 historic |
360 382 586 14175 GNU coreutils 1992-11-08 pseudo-POSIX | 359 382 586 14175 GNU coreutils 1992-11-08 pseudo-POSIX |
361 391 479 10961 FreeBSD 2012-11-24 POSIX | 360 391 479 10961 FreeBSD 2012-11-24 POSIX |
362 588 830 23167 GNU coreutils 2015-05-01 pseudo-POSIX | 361 588 830 23167 GNU coreutils 2015-05-01 pseudo-POSIX |
363 .TE | 362 .TE |
364 .LP | 363 .LP |
365 Roughly four groups can be seen: (1) The two original | 364 There are four rough groups: (1) The two original |
366 implementations, which are mostly identical, with about 100 | 365 implementations, which are mostly identical, with about 100 |
367 SLOC. (2) The five BSD versions, with about 200 SLOC. (3) The | 366 SLOC. (2) The five BSD versions, with about 200 SLOC. (3) The |
368 two POSIX-compliant versions and the old GNU one, with a SLOC | 367 two POSIX-compliant versions and the old GNU one, with a SLOC |
369 count in the 300s. And finally, (4) the modern GNU cut with | 368 count in the 300s. And finally, (4) the modern GNU cut with |
370 almost 600 SLOC. | 369 almost 600 SLOC. |