HPUX collate8[4]






 collate8(4)                                                     collate8(4)





 NAME
      collate8 - collating sequence table for languages with 8-bit character
      sets

 DESCRIPTION
      There are four language dependent collation algorithms for European
      languages.  These algorithms are:

           Two_to_one conversions:
                     Some languages such as Spanish require two adjacent
                     characters to occupy one position in the collating
                     sequence.  Examples are ``CH'' (which follows ``C'')
                     and ``LL'' (which follows ``L'').

           One_to_two conversions:
                     Some languages such as German require one character
                     (e.g. ``sharp S'') to occupy two adjacent positions in
                     the collating sequence.

           Don't-care characters:
                     Some languages designate certain characters to be
                     ignored in character comparisons.  For example, if - is
                     a ``don't-care'' character, the strings REACT and RE-
                     ACT would equal each other when compared.

           Case and accent priority:
                     Many languages require a ``two-pass'' collating
                     algorithm: in pass one, the accents are stripped off
                     the letters and the resulting two strings are compared;
                     if they are equal, a second pass with the accents
                     replaced is performed to break the tie.
                     Uppercase/lowercase differentiation of letters can also
                     be handled in this fashion.

    Table Description
      The collating-sequence table has four sections: a file header, a
      sequence table, a two_to_one mapping table, and a one_to_two mapping
      table.

      File Header:
      The file header has the following format:

           struct header {
               short int  table_len;       /* Table length */
               short int  lang_id;         /* Language id number */
               short int  reserved1;       /* Reserved */
               short int  seq_tab;         /* Address of sequence table */
               short int  seq_len;         /* Length of sequence table */
               short int  two_to_one;      /* Address of two_to_one table */
               short int  two_to_one_len;  /* Length of two_to_one table */
               short int  one_to_two;      /* Address of one_to_two table */



 Hewlett-Packard Company            - 1 -     HP-UX Release 9.0: August 1992






 collate8(4)                                                     collate8(4)





               short int  one_to_two_len;  /* Length of one_to_two table */
               char       low_char;        /* Lowest character */
               char       high_char;       /* Highest character */
           }

      Sequence Table:
      Sequence table entries have the following format:

           struct seq_ent {
               unsigned char   seq_no;      /* Sequence number */
               unsigned char   type_info;   /* Character type  */
           }

      The byte value of a given character is used as an index into the
      sequence table.  The first two bits of type_info are used to keep
      track of the character type.  A value zero means the character is a
      one_to_one character, and the other six bits in type_info contain its
      priority.  A value of one or two means that type_info contains an
      index value into either the two_to_one or the one_to_two mapping table
      respectively.  A value zero in seq_no means the character is a ``don't
      care'' character.

      Mapping Table for two_to_one Mapped Characters:
      Entries in the two_to_one table have the following format:

           struct two_to_one {
               char  reserved1;     /* Reserved */
               char  legal_char;    /* Legal character */
               struct seq_ent  seq2;     /* Sequence entry for this pair */
           }

      ``Legal'' two_to_one characters are listed for each particular
      character.  ``Legal'' means that the combination of two characters is
      treated as a single character.  If a match is found, the corresponding
      sequence entry is used for the two.  Whenever a legal successor is not
      found in table, the character is treated according to one_to_one
      mapping, and the priority in the last entry combined with sequence
      number of the character creates the sequence entry.

      Mapping Table for one_to_two Mapped Characters:
      Entries in the one_to_two mapping table have the same format as
      entries in the sequence table.  The sequence number of the first
      character is known from the entry in the sequence table.  The sequence
      number of the second character is found in the one_to_two mapping
      entry, and the priority is used for both characters.

 WARNING
      This file is provided for historical reasons only.  The recommended
      interface for native language support collation is the routines
      nl_strcmp() and nl_strncmp() (see string(3C)).




 Hewlett-Packard Company            - 2 -     HP-UX Release 9.0: August 1992






 collate8(4)                                                     collate8(4)





 AUTHOR
      collate8 was developed by the Hewlett-Packard Company.

 SEE ALSO
      sort(1), nl_string(3C).

















































 Hewlett-Packard Company            - 3 -     HP-UX Release 9.0: August 1992