comparison cut.en.ms @ 33:a1589fcfe9f4

spell-checking plus a clarification thanks to Francesc
author markus schnalke <meillo@marmaro.de>
date Fri, 02 Oct 2015 07:01:20 +0200
parents 5f78bcd34eeb
children 04a3cdadc50c
comparison
equal deleted inserted replaced
32:5f78bcd34eeb 33:a1589fcfe9f4
71 $ cut -b -500 71 $ cut -b -500
72 .CE 72 .CE
73 .LP 73 .LP
74 The remainder can be caught with \f(CWcut -b 501-\fP. This 74 The remainder can be caught with \f(CWcut -b 501-\fP. This
75 use of cut is important for POSIX, because it provides a 75 use of cut is important for POSIX, because it provides a
76 transformation of text files with arbitrary line lenghts to text 76 transformation of text files with arbitrary line lengths to text
77 files with limited line length 77 files with limited line length
78 .[[ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cut.html#tag_20_28_17 . 78 .[[ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cut.html#tag_20_28_17 .
79 .PP 79 .PP
80 The introduction of the new byte mode essentially held the same 80 The introduction of the new byte mode essentially held the same
81 functionality as the old character mode. The character mode, 81 functionality as the old character mode. The character mode,
102 .CE 102 .CE
103 .LP 103 .LP
104 (The values to the command line switches may be appended directly 104 (The values to the command line switches may be appended directly
105 to them or separated by whitespace.) 105 to them or separated by whitespace.)
106 .PP 106 .PP
107 The field mode is suited for simple tabulary data, like the 107 The field mode is suited for simple tabular data, like the
108 password file. Beyond that, it soon reaches its limits. The typical 108 password file. Beyond that, it soon reaches its limits. The typical
109 case of whitespace-separated fields, in particular, is covered 109 case of whitespace-separated fields, in particular, is covered
110 poorly by it. Cut's delimiter is exactly one character, 110 poorly by it. Cut's delimiter is exactly one character,
111 therefore one can not split at both space and tab characters. 111 therefore one can not split at both space and tab characters.
112 Furthermore, multiple adjacent delimiter characters lead to 112 Furthermore, multiple adjacent delimiter characters lead to
220 tools sed and awk were part of it already. Hence, the 220 tools sed and awk were part of it already. Hence, the
221 question comes to mind why cut was written at all, as two 221 question comes to mind why cut was written at all, as two
222 programs already existed that were able to cover its use 222 programs already existed that were able to cover its use
223 cases. One reason for cut surely was its compactness and the 223 cases. One reason for cut surely was its compactness and the
224 resulting speed, in comparison to the then-bulky awk. This lean 224 resulting speed, in comparison to the then-bulky awk. This lean
225 shape goes well with the Unix philosopy: Do one job and do it 225 shape goes well with the Unix philosophy: Do one job and do it
226 well! Cut was sufficiently convincing. It found its way to 226 well! Cut was sufficiently convincing. It found its way to
227 other Unix variants, it became standardized, and today it is 227 other Unix variants, it became standardized, and today it is
228 present everywhere. 228 present everywhere.
229 .PP 229 .PP
230 The original variant (without \f(CW-b\fP) was described already 230 The original variant (without \f(CW-b\fP) was described already
274 FreeBSD and the Heirloom toolchest. Tim Robbins 274 FreeBSD and the Heirloom toolchest. Tim Robbins
275 reimplemented the character mode of FreeBSD cut, 275 reimplemented the character mode of FreeBSD cut,
276 conforming to POSIX, in the summer of 2004 276 conforming to POSIX, in the summer of 2004
277 .[[ https://svnweb.freebsd.org/base?view=revision&revision=131194 . 277 .[[ https://svnweb.freebsd.org/base?view=revision&revision=131194 .
278 The question why the other BSD systems have not 278 The question why the other BSD systems have not
279 integrated this change is an open one. Maybe the answer can be 279 integrated this change is an open one. Maybe the answer is
280 found in the above quoted statement. 280 a general ignorance of internationalization.
281 .PP 281 .PP
282 How do users find out if the cut on their own system handles 282 How do users find out if the cut on their own system handles
283 multi-byte characters correctly? First, one needs to check if 283 multi-byte characters correctly? First, one needs to check if
284 the system itself uses multi-byte characters, because otherwise 284 the system itself uses multi-byte characters, because otherwise
285 characters and bytes are equivalent and the question 285 characters and bytes are equivalent and the question