HPUX hpnls[5]

hpnls(5) hpnls(5)
NAME
hpnls - HP Native Language Support (NLS) Model
DESCRIPTION
Native Language Support (NLS) reduces or eliminates the barriers that
would otherwise make HP-UX difficult to use in a non-English-speaking
work environment. NLS is available at the user-command level as well
as through commands and libraries that can be used to develop
international software applications.
Many existing C library routines have been modified to operate based
upon a program's locale. A locale is the run-time NLS environment of
a program which is loaded by setlocale() (see setlocale(3C)). For a
complete list of what library routines are affected by setlocale(),
see setlocale(3C)).
In addition to routines that operate based on the program's locale,
there are also commands and routines to provide a messaging system for
accessing program messages based on the language requirements of the
end-user.
Many HP-UX commands have been modified to operate in a manner
sensitive to the language requirements of the end-user. These
language requirements are established through the internationalization
environment variables (see environ(5)). The EXTERNAL
INFLUENCES/Environment Variables section of the manual entry for each
command that has NLS capabilities describes which environment
variables the command is sensitive to.
In addition, the portnls routines are a set of library routines that
perform miscellaneous language-dependent operations. portnls is
intended to provide portability between HP-UX and MPE (another HP
operating system). See portnls(5) for more information.
Below are areas of functionality that are considered language-
sensitive :
Character Handling
NLS provides for handling characters outside the 7-bit
USASCII codeset. Most languages require a minimum of 8-bits
to support all the characters needed to communicate in that
language. Characters must be handled according to the
requirements of the language they represent.
Codesets with 8-bit characters have been defined to support
phonetic languages, such as the Western European languages.
The use of an 8-bit character allows for an additional 128
characters beyond the USASCII codeset.
More than 8 bits are needed to uniquely define codes for
characters required by ideographic languages such as
Hewlett-Packard Company - 1 - HP-UX Release 9.0: August 1992
hpnls(5) hpnls(5)
Japanese. For such languages, multibyte codesets are used
in which a character is represented by a sequence of one or
more bytes. Multibyte codesets are defined according to the
rules of a multibyte encoding scheme. Encoding schemes
define the particular sequences of byte values that can be
used to form characters. The EUC encoding scheme is
supported by HP-UX. However, only the one- and two-byte
forms of EUC are currently supported. Refer to the Native
Language Support User's Guide for more information about
EUC.
Character Classification
Characters have many attributes associated with them. For
example, characters may be classified as printable,
alphabetic, numeric, etc. These attributes are commonly
referred to as ctype characteristics. Characters and their
associated attributes differ between languages. Character
processing that depends on character classification must be
sensitive to these differences.
Shifting
The notion of uppercase and lowercase differs between
languages. For example, in some languages accents are
discarded when characters are shifted to uppercase. Some
languages have no notion of uppercase and lowercase
characters. For example, shifting a character has no effect
in ideographic languages.
Collating
Collating sequences differ between languages and most
languages require multiple collating sequences. The
following collation features are available to provide a full
``dictionary-'' or ``context-based'' language-dependent
comparison :
Two-to-one conversions
Some languages, such as Spanish, require two
adjacent characters to occupy one position in
the collating sequence. Examples are CH (which
follows C) and LL (which follows L).
One-to-two conversions
Some languages, such as German, require one
character (such as ``sharp S'') to occupy two
adjacent positions in the collating sequence.
Don't-care characters
Some languages designate certain characters to
be ignored in character comparisons. For
example, if - is a don't-care character, the
strings REACT and RE-ACT would equal each other
Hewlett-Packard Company - 2 - HP-UX Release 9.0: August 1992
hpnls(5) hpnls(5)
when compared.
Uppercase/lowercase and accent priority
Many languages require a ``two-pass'' collating
algorithm. In the first pass, accents are
stripped from their letters and the resulting
two strings are compared. If they are equal, a
second pass with the accents reinserted is
performed to break the tie.
Uppercase/lowercase differences can also be
first ignored then used to break ties in this
fashion.
Two common methods of collation for phonetic languages are folded and
nonfolded. A folded collating sequence is made up of the uppercase
and lowercase characters intermixed. An unfolded collating sequence
is made up of all the uppercase characters followed by the lowercase
characters. For example, collating the characters a b c A B C with
folded collation would result in the following order :
A a B b C c
Collating the same characters with unfolded collation would result in
the following order :
A B C a b c
For languages in which folded and unfolded collation methods are
defined, HP-UX uses folded as the default. The setlocale modifier
nofold can be used to enable the nonfolded collating method (see
environ(5)). The nlsinfo command reports the collating methods
supported for each language (see nlsinfo(1)).
Directionality
Two properties of text files and Native Languages must be
understood to process text in non-Western languages. They
are the mode of the language, and the order of the
characters.
Mode refers to the direction that a language is naturally
read. European languages read from left to right, some
Middle Eastern languages read from right to left, and Far
Eastern languages usually use vertical columns, beginning
from the right.
Order describes the order in which characters are written,
stored in a file, or displayed. Keyboard order refers to
the order of keystrokes by a user. Screen order refers to
the order in which characters are displayed on a terminal
screen or printed.
Hewlett-Packard Company - 3 - HP-UX Release 9.0: August 1992
hpnls(5) hpnls(5)
Screen order can differ from keyboard order when using a
terminal that supports mixing Latin and non-Latin text, each
requiring different directionality. In the following
example, the text mode is right-to-left; n represents a
non-Latin character, l represents a Latin character, and the
numbers represent the order in which the sequence is typed.
In keyboard order, the letters would be stored in a file as
follows:
n1 n2 n3 l4 l5 l6
In screen order, the letters would be stored in a file as
follows:
n1 n2 n3 l6 l5 l4
However, both screen-order and key-order sequences would
look identical on the screen because the terminal would be
configured to display the characters properly according to
the directionality requirements of both the Latin and non-
Latin languages.
Local Customs
NLS supports customs that are specific to a particular
geographic region such as representation of numeric and
monetary data, date, and time. These customs can differ not
only between languages, but also between regions that share
a common language.
Representation of numbers
The character used to denote the radix of a
decimal number varies for different regions.
Similarly the use of a "thousands" indicator
or grouping of digits can vary with local
custom. Characters used to represent digits
can also vary for different regions.
Monetary representation
The currency symbol and the formatting of
monetary values varies from country to
country. For instance, the symbol can either
precede or follow the monetary value. Some
currencies allow decimal fractions while
others use alternate methods of representing
smaller monetary values.
Date and time representation
While the Gregorian calendar is most common,
some countries use other methods for
determining meridian day and year, usually
Hewlett-Packard Company - 4 - HP-UX Release 9.0: August 1992
hpnls(5) hpnls(5)
based on seasonal, astronomical, or
historical events. Month and weekday names
as well as the format of date and time varies
from country to country. Even when a
strictly numeric date/time representation is
used, the order of year, month and day, and
the delimiters that separate them, is not
universal.
The HP-UX system clock runs on Coordinated
Universal Time. Time zone adjustments for a
particular regions can be specified through
the TZ environment variable (see environ(5)).
Messages
Messages issued by a program should be sensitive to the
language of the end-user. NLS provides a messaging facility
for extracting hard-coded strings (messages) from an
application source code and storing them externally to the
code. Utilities are provided to aid the translation of
messages such that at runtime the program accesses messages
that coincide with the end-user's native language.
FILES
/usr/lib/nls/*
AUTHOR
hpnls was developed by HP.
SEE ALSO
insertmsg(1), gencat(1), catgets(3C), catopen(3C), setlocale(3C),
wconv(3X), wctype(3X), wstring(3X), environ(5), lang(5).
Native Language Support User's Guide.
For additional information, see the EXTERNAL INFLUENCES/Environment
Variables section of applicable manual entries for commands and
library routines.
Hewlett-Packard Company - 5 - HP-UX Release 9.0: August 1992