INTRO (1)

NAME

intro - introduction to commands

CONTENTS

Description
     Multibyte character encodings
     List of commands
     Other manual entries

DESCRIPTION

The Heirloom Toolchest is a collection of standard Unix utilities that is intended to provide maximum compatibility with traditional Unix while incorporating additional features necessary today. To achieve this, utilities are derived from original Unix sources if permitted by its licenses. This means that material from Unix 6th Edition, Unix 7th Edition, and Unix 32V was used, since these systems were put under an Open Source license by Caldera in January 2002. In addition, 4BSD source (governed by the University's copyright and partially derived from 32V) has been used. (Other sources were Sun's 'OpenSolaris', Caldera's 'Open Source Unix[tm] Tools', the MINIX utility collection, Plan 9, and Info-ZIP's compression codes.) If no freely available Unix sources were available (for example, for tools introduced in System III or System V), utilities were rewritten from scratch. (The exact license terms are provided in a separate document.)

The tools in this collection are oriented on the specifications or systems named below. Since there are some incompatibilities between them, some tools are present in more than one version.
- System V Interface Definition, Third Edition (UNIX System Laboratories, 1992) (SVID3). This specification corresponds to a System V Release 4 or Solaris 2 system. Utilities in /usr/5bin are modeled after this specification and related system environments. If extensions introduced in POSIX.2 or POSIX.1-2001 (see below) did not provoke conflicts with the behavior at this level, they were incorporated in these utilities as well. This is the most traditional personality available with the Heirloom Toolchest; prominently, regular expressions do not have any of the internationalization features (see ed(1) and egrep(1)), and awk is the old version, oawk(1). Use this personality to get best compatibility with traditional System V behavior.
- System V Interface Definition, Fourth Edition (Novell, Inc., 1995) (SVID4). This specification corresponds to a System V Release 4.2 MP system. Utilities in /usr/5bin/s42 are modeled after this specification and related system environments. If extensions introduced in POSIX.2 or POSIX.1-2001 (see below) did not provoke conflicts with the behavior at this level, they were incorporated in these utilities as well. The most essential difference between this and the SVID3 personality are internationalized regular expressions and the choice of the new awk, nawk(1), for awk. Use this personality to get traditional System V behavior combined with internationalized regular expressions.
- ISO/IEC 9945-2:1993 / ANSI/IEEE Std 1003.2-1992 (POSIX.2), with the extensions of The Single UNIX Specification, Version 2 (The Open Group, 1997). Utilities in /usr/5bin/posix are intended to fully comply to this specification even in cases of conflict with historical behavior. Non-conflicting extensions to POSIX.2 found in the environments described above are also present in these utilities. Use this personality if you need POSIX.2 features in preference to traditional System V ones.
- ISO/IEC 9945-1:2001 / ANSI/IEEE Std 1003.1-2001 (POSIX.1-2001), with the extensions of The Single UNIX Specification, Version 3 (The Open Group, 2001). Utilities in /usr/5bin/posix2001 are intended to fully comply to this specification even in cases of conflict with historical behavior. Non-conflicting extensions to POSIX.1-2001 found in the environments described above are also present in these utilities. Use this personality if you need POSIX.1-2001 features in preference to traditional System V ones.
To use the Heirloom Toolchest, select one of these personalities and put the corresponding directory at the beginning of the PATH environment variable, immediately followed by the toolchest base directory, /usr/5bin (which contains the tools that are the same for all personalities). For example, to use the toolchest with a SVID4 personality, execute
 

PATH=/usr/5bin/s42:/usr/5bin:$PATH export PATH

You must select exactly one of the personalities above; you do not have access to the complete set of tools otherwise.
The manual pages generally note which behavior corresponds to which utility version. They also mark whether options and arguments were part of System V, were introduced with POSIX.2 or POSIX.1-2001, or if they are extensions provided by the Heirloom Toolchest, (possibly oriented at extensions introduced by other vendors). Such extensions are subject to change without a grace period; they are only intended for interactive usage and should not be included in scripts.
The toolchest also includes some utilities modeled after the BSD Compatibility environment of System V; these roughly correspond to 4.3BSD or SunOS 4 systems. These tools can be found in /usr/ucb; since they do not form a full personality set as the ones described above, they should be used in addition, as e.g.
 

PATH=/usr/ucb:/usr/5bin/s42:/usr/5bin:$PATH export PATH

does.

While the Heirloom Toolchest is intended to be as compatible as possible with historical practice in general, annoying static limits of historical implementations are not present any longer. Input lines of unlimited length are generally accepted (as long as enough memory is available); most utilities are also able to handle binary input data (i.e. ASCII NUL characters in the input stream).

    Multibyte character encodings

The Heirloom Toolchest includes support for multibyte character encodings; if the underlying C library supports this and the LC_CTYPE locale (see locale(7) for an introduction) is set appropriately, multiple input bytes can form a single character and are handled as such in regular expressions, display width computations etc.

Multibyte character support was designed with special regard to the UTF-8 encoding. Additional supported encodings are EUC-JP, EUC-KR, Big5, Big5-HKSCS, GB 2312, and GBK. Other encodings may also work, with the following restrictions:
- The character set must be a superset of ASCII (more specifically, of the International Reference Version of ISO 646). All ASCII characters must be encoded as a single byte with the same value as the ASCII character. This excludes 7-bit encodings like UTF-7. In addition, the C language implementation must map each ASCII character to a wide character with the same value.
- The first byte of each multibyte character must have the highest bit set, i.e. it must not be an ASCII character. This excludes encodings whose sequences start with ASCII characters like TCVN 5712.
- Locking-shift encodings, like those that use ISO 2022 escape sequences, are not supported.

Character comparison, regular expression matching and similar tasks are generally performed on the character representation obtained from the locale processing of the C library. A glyph formed by the application of combining characters to a base character will thus not normally be considered equal to the same glyph represented by a single base character. For string comparison, the results depend on the collation mechanism of the locale, which might or might not respect such relations.

Processing of multibyte character encodings is often notably slower than that of singlebyte character encodings. Since many widely-used languages (especially European ones based on Latin letters) contain few multibyte characters if encoded in UTF-8, and since experience shows that large amounts of textual data tend to be machine generated and to contain mostly ASCII characters (e.g. log files), while international language texts are mostly created by humans and tend to be smaller, processing of text in multibyte locales has generally been optimized for ASCII text. The performance penalty for using a multibyte locale is thus usually low if no or few multibyte characters actually occur in the data processed.

A problem with multibyte encodings that does not normally occur in singlebyte encodings is that of illegal byte sequences. In a singlebyte locale, each byte is treated as a character entity even if its value is not defined in the coded character set. For example, bytes with their highest bit set are simply passed through in the default 'C' or 'POSIX' locale, and can appear in option arguments as well as in input data. In multibyte locales however, byte sequences that do not form a valid character cannot be handled this way, because it is not always clear which bytes are to be grouped together. As an example, suppose that the '\200' byte introduces a multibyte sequence. If this byte occurs in a string to be matched by a utility but is not followed by a valid continuation byte, it is unclear if it should match any byte sequence containing this byte, including valid ones that form a character, or if matches should be restricted to occurences in other incomplete sequences. For this reason, this implementation generally treats illegal byte sequences in command line arguments or programming scripts as syntax errors. Utilities do not issue a warning or even terminate with an error if such sequences appear in input data, though, since this frequently occurs in practice when processing binary or foreign-locale files. In most cases, the sequences are passed to the output unaltered. That data is accepted or generated by a utility can thus not be taken as an indication for its validity in respect to the current character encoding.

    List of commands

NameAppears on PageDescription
aproposapropos(1)locate commands by keyword lookup
bannerbanner(1)make posters
basenamebasename(1)return non-directory portion of a pathname
basenamebasename(1B)(BSD) return non-directory portion of a pathname
bcbc(1)arbitrary-precision arithmetic language
bdiffbdiff(1)big diff
bfsbfs(1)big file scanner
calcal(1)print calendar
calendarcalendar(1)reminder service
catcat(1)concatenate and print files
catmancatman(8)create the formatted files for the reference manual
chgrpchown(1)change owner or group
chmodchmod(1)change mode
chownchown(1)change owner or group
chownchown(1B)(BSD) change file ownwer
cksumcksum(1)write file checksums and sizes
cmpcmp(1)compare two files
colcol(1)filter reverse line feeds
commcomm(1)select or reject lines common to two sorted files
copycopy(1XNX)(XENIX) copy groups of files
cpcp(1)copy files
cpiocpio(1)copy file archives in and out
csplitcsplit(1)context split
cutcut(1)cut out selected fields of each line of a file
datedate(1)print or set the date
dcdc(1)desk calculator
dddd(1)convert and copy a file
deroffderoff(1)remove nroff/troff, tbl, and eqn constructs
deroffderoff(1B)(BSD) remove nroff, troff, tbl and eqn constructs
dfdf(1)disk free
dfdf(1B)(BSD) disk free
dfspacedf(1)disk free
diffdiff(1)differential file comparator
diff3diff3(1)3-way differential file comparison
dircmpdircmp(1)directory comparison
dirnamedirname(1)return the directory portion of a pathname
dudu(1)summarize disk usage
dudu(1B)(BSD) summarize disk usage
echoecho(1)echo arguments
echoecho(1B)(BSD) echo arguments
eded(1)text editor
egrepegrep(1)search a file for a pattern using full regular expressions
envenv(1)set environment for command invocation
expandexpand(1)convert tabs to spaces
exprexpr(1)evaluate arguments as an expression
factorfactor(1)factor a number
falsetrue(1)provide truth values
fgrepfgrep(1)search a file for a character string
filefile(1)determine file type
findfind(1)find files
fmtfmt(1)simple text formatter
fmtmsgfmtmsg(1)display a message in standard format
foldfold(1)fold long lines
getconfgetconf(1)get configuration values
getoptgetopt(1)parse command options
grepgrep(1)search a file for a pattern
groupsgroups(1)show group memberships
groupsgroups(1B)(BSD) show group memberships
hdhd(1XNX)(XENIX) display files in hexadecimal format
headhead(1)display first few lines of files
hostnamehostname(1)set or print name of current host system
idid(1)print user and group IDs and names
installinstall(1B)(BSD) install files
joinjoin(1)relational database operator
killkill(1)terminate a process
lcls(1)list contents of directory
lineline(1)read one line
listuserslistusers(1)print a list of user logins
lnln(1)make a link
lnln(1B)(BSD) make links
loginslogins(1)list login information
lognamelogname(1)get login name
lsls(1)list contents of directory
lsls(1B)(BSD) list contents of directory
mailmail(1)send or receive mail among users
manman(1)find and display reference manual pages
mesgmesg(1)permit or deny messages
mkdirmkdir(1)make a directory
mkfifomkfifo(1)make FIFO special file
mknodmknod(1M)build special file
moremore(1)browse or page through a text file
mtmt(1)magnetic tape utility
mvmv(1)move or rename files and directories
mvdirmvdir(1)move a directory
nawknawk(1)pattern scanning and processing language
newformnewform(1)change the format of a text file
newsnews(1)print news items
nicenice(1)run a command at low priority
nlnl(1)line numbering filter
nohupnohup(1)run a command immune to hangups
oawkoawk(1)pattern scanning and processing language
odod(1)octal dump
pagemore(1)browse or page through a text file
pastepaste(1)merge same lines of several files or subsequent lines of one file
pathchkpathchk(1)check pathnames
paxpax(1)portable archive interchange
pgpg(1)file perusal filter for CRTs
pgreppgrep(1)find or signal processes by name and other attributes
pkillpgrep(1)find or signal processes by name and other attributes
prpr(1)print files
printenvprintenv(1)print out the environment
printfprintf(1)print a text string
priocntlpriocntl(1)process scheduler control
psps(1)process status
psps(1B)(BSD) process status
psrinfopsrinfo(1)displays information about processors
ptimetime(1)time a command
pwdpwd(1)working directory name
randomrandom(1XNX)(XENIX) generate a random number
renicerenice(1)alter priority of running processes
rmrm(1)remove directory entries
rmdirrmdir(1)remove directories
sdiffsdiff(1)print file differences side-by-side
sedsed(1)stream editor
setpgrpsetpgrp(1)set process group ID and session ID
settimesettime(1XNX)(XENIX) change the access and modification dates of files
shlshl(1)shell layer manager
sleepsleep(1)suspend execution for an interval
sortsort(1)sort or merge files
spellspell(1)find spelling errors
splitsplit(1)split a file into pieces
sttystty(1)set the options for a terminal
sttystty(1B)(BSD) set the options for a terminal
susu(1)become super-user or another user
sumsum(1)sum and count blocks in a file
sumsum(1B)(BSD) sum and count blocks in a file
syncsync(1M)update the super block
tabstabs(1)set terminal tabs
tailtail(1)deliver the last part of a file
tapetape(1)magnetic tape maintenance
tapecntltapecntl(1)tape control for tape devices
tartar(1)tape archiver
tcopytcopy(1)copy a magnetic tape
teetee(1)pipe fitting
testtest(1)condition command
testtest(1B)(BSD) condition command
timetime(1)time a command
touchtouch(1)update file access and modification times
trtr(1)translate characters
trtr(1B)(BSD) translate characters
truetrue(1)provide truth values
tsorttsort(1)topological sort
ttytty(1)get terminal name
ulul(1)underline
unameuname(1)get system name
unexpandunexpand(1)convert spaces to tabs
uniquniq(1)report repeated lines in a file
unitsunits(1)conversion program
uptimeuptime(1)show how long system has been up
usersusers(1)display a compact list of users logged in
ww(1)who is on and what they are doing
wcwc(1)word count
whatwhat(1)identify SCCS files
whatiswhatis(1)display a one-line summary about a keyword
whowho(1)who is on the system
whoamiwhoami(1)display the effective current username
whodowhodo(1)who is doing what
xargsxargs(1)construct argument list(s) and execute command
yesyes(1XNX)(XENIX) print strint repeatedly

    Other manual entries

PageDescription
fspec(5)format specifications in text files
intro(1)introduction to commands
man(7)macros to typeset manual


Heirloom Toolchest INTRO (1) 1/22/06
Generated by a modified version of manServer 1.07 from intro.1 using man macros with tbl support.