HPUX hpnls[5]






 hpnls(5)                                                           hpnls(5)





 NAME
      hpnls - HP Native Language Support (NLS) Model

 DESCRIPTION
      Native Language Support (NLS) reduces or eliminates the barriers that
      would otherwise make HP-UX difficult to use in a non-English-speaking
      work environment.  NLS is available at the user-command level as well
      as through commands and libraries that can be used to develop
      international software applications.

      Many existing C library routines have been modified to operate based
      upon a program's locale.  A locale is the run-time NLS environment of
      a program which is loaded by setlocale() (see setlocale(3C)).  For a
      complete list of what library routines are affected by setlocale(),
      see setlocale(3C)).

      In addition to routines that operate based on the program's locale,
      there are also commands and routines to provide a messaging system for
      accessing program messages based on the language requirements of the
      end-user.

      Many HP-UX commands have been modified to operate in a manner
      sensitive to the language requirements of the end-user.  These
      language requirements are established through the internationalization
      environment variables (see environ(5)).  The EXTERNAL
      INFLUENCES/Environment Variables section of the manual entry for each
      command that has NLS capabilities describes which environment
      variables the command is sensitive to.

      In addition, the portnls routines are a set of library routines that
      perform miscellaneous language-dependent operations.  portnls is
      intended to provide portability between HP-UX and MPE (another HP
      operating system).  See portnls(5) for more information.

      Below are areas of functionality that are considered language-
      sensitive :

      Character Handling
                NLS provides for handling characters outside the 7-bit
                USASCII codeset.  Most languages require a minimum of 8-bits
                to support all the characters needed to communicate in that
                language.  Characters must be handled according to the
                requirements of the language they represent.

                Codesets with 8-bit characters have been defined to support
                phonetic languages, such as the Western European languages.
                The use of an 8-bit character allows for an additional 128
                characters beyond the USASCII codeset.

                More than 8 bits are needed to uniquely define codes for
                characters required by ideographic languages such as



 Hewlett-Packard Company            - 1 -     HP-UX Release 9.0: August 1992






 hpnls(5)                                                           hpnls(5)





                Japanese.  For such languages, multibyte codesets are used
                in which a character is represented by a sequence of one or
                more bytes.  Multibyte codesets are defined according to the
                rules of a multibyte encoding scheme.  Encoding schemes
                define the particular sequences of byte values that can be
                used to form characters.  The EUC encoding scheme is
                supported by HP-UX.  However, only the one- and two-byte
                forms of EUC are currently supported.  Refer to the Native
                Language Support User's Guide for more information about
                EUC.

      Character Classification
                Characters have many attributes associated with them.  For
                example, characters may be classified as printable,
                alphabetic, numeric, etc.  These attributes are commonly
                referred to as ctype characteristics.  Characters and their
                associated attributes differ between languages.  Character
                processing that depends on character classification must be
                sensitive to these differences.

      Shifting
                The notion of uppercase and lowercase differs between
                languages.  For example, in some languages accents are
                discarded when characters are shifted to uppercase.  Some
                languages have no notion of uppercase and lowercase
                characters.  For example, shifting a character has no effect
                in ideographic languages.

      Collating
                Collating sequences differ between languages and most
                languages require multiple collating sequences.  The
                following collation features are available to provide a full
                ``dictionary-'' or ``context-based'' language-dependent
                comparison :

                   Two-to-one conversions
                             Some languages, such as Spanish, require two
                             adjacent characters to occupy one position in
                             the collating sequence.  Examples are CH (which
                             follows C) and LL (which follows L).

                   One-to-two conversions
                             Some languages, such as German, require one
                             character (such as ``sharp S'') to occupy two
                             adjacent positions in the collating sequence.

                   Don't-care characters
                             Some languages designate certain characters to
                             be ignored in character comparisons.  For
                             example, if - is a don't-care character, the
                             strings REACT and RE-ACT would equal each other



 Hewlett-Packard Company            - 2 -     HP-UX Release 9.0: August 1992






 hpnls(5)                                                           hpnls(5)





                             when compared.

                   Uppercase/lowercase and accent priority
                             Many languages require a ``two-pass'' collating
                             algorithm.  In the first pass, accents are
                             stripped from their letters and the resulting
                             two strings are compared.  If they are equal, a
                             second pass with the accents reinserted is
                             performed to break the tie.
                             Uppercase/lowercase differences can also be
                             first ignored then used to break ties in this
                             fashion.

      Two common methods of collation for phonetic languages are folded and
      nonfolded.  A folded collating sequence is made up of the uppercase
      and lowercase characters intermixed.  An unfolded collating sequence
      is made up of all the uppercase characters followed by the lowercase
      characters.  For example, collating the characters a b c A B C with
      folded collation would result in the following order :

           A a B b C c

      Collating the same characters with unfolded collation would result in
      the following order :

           A B C a b c

      For languages in which folded and unfolded collation methods are
      defined, HP-UX uses folded as the default.  The setlocale modifier
      nofold can be used to enable the nonfolded collating method (see
      environ(5)).  The nlsinfo command reports the collating methods
      supported for each language (see nlsinfo(1)).

      Directionality
                Two properties of text files and Native Languages must be
                understood to process text in non-Western languages.  They
                are the mode of the language, and the order of the
                characters.

                Mode refers to the direction that a language is naturally
                read.  European languages read from left to right, some
                Middle Eastern languages read from right to left, and Far
                Eastern languages usually use vertical columns, beginning
                from the right.

                Order describes the order in which characters are written,
                stored in a file, or displayed.  Keyboard order refers to
                the order of keystrokes by a user.  Screen order refers to
                the order in which characters are displayed on a terminal
                screen or printed.




 Hewlett-Packard Company            - 3 -     HP-UX Release 9.0: August 1992






 hpnls(5)                                                           hpnls(5)





                Screen order can differ from keyboard order when using a
                terminal that supports mixing Latin and non-Latin text, each
                requiring different directionality.  In the following
                example, the text mode is right-to-left; n represents a
                non-Latin character, l represents a Latin character, and the
                numbers represent the order in which the sequence is typed.

                In keyboard order, the letters would be stored in a file as
                follows:

                n1 n2 n3 l4 l5 l6

                In screen order, the letters would be stored in a file as
                follows:

                n1 n2 n3 l6 l5 l4

                However, both screen-order and key-order sequences would
                look identical on the screen because the terminal would be
                configured to display the characters properly according to
                the directionality requirements of both the Latin and non-
                Latin languages.

      Local Customs
                NLS supports customs that are specific to a particular
                geographic region such as representation of numeric and
                monetary data, date, and time.  These customs can differ not
                only between languages, but also between regions that share
                a common language.

                     Representation of numbers
                               The character used to denote the radix of a
                               decimal number varies for different regions.
                               Similarly the use of a "thousands" indicator
                               or grouping of digits can vary with local
                               custom.  Characters used to represent digits
                               can also vary for different regions.

                     Monetary representation
                               The currency symbol and the formatting of
                               monetary values varies from country to
                               country.  For instance, the symbol can either
                               precede or follow the monetary value.  Some
                               currencies allow decimal fractions while
                               others use alternate methods of representing
                               smaller monetary values.

                     Date and time representation
                               While the Gregorian calendar is most common,
                               some countries use other methods for
                               determining meridian day and year, usually



 Hewlett-Packard Company            - 4 -     HP-UX Release 9.0: August 1992






 hpnls(5)                                                           hpnls(5)





                               based on seasonal, astronomical, or
                               historical events.  Month and weekday names
                               as well as the format of date and time varies
                               from country to country.  Even when a
                               strictly numeric date/time representation is
                               used, the order of year, month and day, and
                               the delimiters that separate them, is not
                               universal.

                               The HP-UX system clock runs on Coordinated
                               Universal Time.  Time zone adjustments for a
                               particular regions can be specified through
                               the TZ environment variable (see environ(5)).

      Messages
                Messages issued by a program should be sensitive to the
                language of the end-user.  NLS provides a messaging facility
                for extracting hard-coded strings (messages) from an
                application source code and storing them externally to the
                code.  Utilities are provided to aid the translation of
                messages such that at runtime the program accesses messages
                that coincide with the end-user's native language.

 FILES
      /usr/lib/nls/*

 AUTHOR
      hpnls was developed by HP.

 SEE ALSO
      insertmsg(1), gencat(1), catgets(3C), catopen(3C), setlocale(3C),
      wconv(3X), wctype(3X), wstring(3X), environ(5), lang(5).

      Native Language Support User's Guide.

      For additional information, see the EXTERNAL INFLUENCES/Environment
      Variables section of applicable manual entries for commands and
      library routines.
















 Hewlett-Packard Company            - 5 -     HP-UX Release 9.0: August 1992