Package org.w3c.tidy

Class TidyUtils


  • public final class TidyUtils
    extends java.lang.Object
    Utility class with handy methods, mainly for String handling or for reproducing c behaviours.
    Version:
    $Revision $ ($Author $)
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static short DIGIT
      char type: digit.
      private static short LETTER
      char type: letter.
      private static short[] lexmap
      used to classify chars for lexical purposes.
      private static short LOWERCASE
      char type: lowercase.
      private static short NAMECHAR
      char type: namechar.
      private static short NEWLINE
      char type: newline.
      private static short UPPERCASE
      char type: uppercase.
      private static short WHITE
      char type: whitespace.
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private TidyUtils()
      utility class, don't instantiate.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static boolean findBadSubString​(java.lang.String s, java.lang.String p, int len)
      Return true if substring s is in p and isn't all in upper case.
      static char foldCase​(char c, boolean tocaps, boolean xmlTags)
      Fold case of a char.
      static byte[] getBytes​(java.lang.String str)
      Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.
      static java.lang.String getString​(byte[] bytes, int offset, int length)
      Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.
      static boolean isCharEncodingSupported​(java.lang.String name)
      Is the given character encoding supported?
      static boolean isDigit​(char c)
      Is the given char a digit?
      (package private) static boolean isInValuesIgnoreCase​(java.lang.String[] validValues, java.lang.String valueToCheck)
      Check if the string valueToCheck is contained in validValues array (case insesitie comparison).
      static boolean isLetter​(char c)
      Is the given char a letter?
      static boolean isLower​(char c)
      Determines if the specified character is a lowercase character.
      static boolean isNamechar​(char c)
      Is the given char valid in name? (letter, digit or "-", ".", ":", "_")
      (package private) static boolean isQuote​(int c)
      Is the given character a single or double quote?
      static boolean isUpper​(char c)
      Determines if the specified character is a uppercase character.
      static boolean isWhite​(char c)
      Determines if the specified character is whitespace.
      (package private) static boolean isxdigit​(char c)
      Is the character a hex digit?
      (package private) static boolean isXMLLetter​(char c)
      Is the given char a valid xml letter?
      (package private) static boolean isXMLNamechar​(char c)
      Is the given char valid in xml name?
      static int lastChar​(java.lang.String str)
      Return the last char in string.
      private static short map​(char c)
      Returns the constant which defines the classification of char in lexmap.
      private static void mapStr​(java.lang.String str, short code)
      Classify chars in String and put them in lexmap.
      (package private) static boolean toBoolean​(int value)
      Converts a int to a boolean.
      static char toLower​(char c)
      Maps the given character to its lowercase equivalent.
      (package private) static int toUnsigned​(int c)
      convert an int to unsigned (& 0xFF).
      static char toUpper​(char c)
      Maps the given character to its uppercase equivalent.
      (package private) static int wstrnchr​(java.lang.String s1, int len1, char cc)
      return offset of cc from beginning of s1, -1 if not found.
      (package private) static boolean wsubstr​(java.lang.String s1, java.lang.String s2)
      Same as wsubstrn, but without a specified length.
      (package private) static boolean wsubstrn​(java.lang.String s1, int len1, java.lang.String s2)
      check if the first String contains the second one.
      (package private) static boolean wsubstrncase​(java.lang.String s1, int len1, java.lang.String s2)
      check if the first String contains the second one (ignore case).
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • TidyUtils

        private TidyUtils()
        utility class, don't instantiate.
    • Method Detail

      • toBoolean

        static boolean toBoolean​(int value)
        Converts a int to a boolean.
        Parameters:
        value - int value
        Returns:
        true if value is != 0
      • toUnsigned

        static int toUnsigned​(int c)
        convert an int to unsigned (& 0xFF).
        Parameters:
        c - signed int
        Returns:
        unsigned int
      • wsubstrn

        static boolean wsubstrn​(java.lang.String s1,
                                int len1,
                                java.lang.String s2)
        check if the first String contains the second one.
        Parameters:
        s1 - full String
        len1 - maximum position in String
        s2 - String to search for
        Returns:
        true if s1 contains s2 in the range 0-len1
      • wsubstrncase

        static boolean wsubstrncase​(java.lang.String s1,
                                    int len1,
                                    java.lang.String s2)
        check if the first String contains the second one (ignore case).
        Parameters:
        s1 - full String
        len1 - maximum position in String
        s2 - String to search for
        Returns:
        true if s1 contains s2 in the range 0-len1
      • wstrnchr

        static int wstrnchr​(java.lang.String s1,
                            int len1,
                            char cc)
        return offset of cc from beginning of s1, -1 if not found.
        Parameters:
        s1 - String
        len1 - maximum offset (values > than lenl are ignored and returned as -1)
        cc - character to search for
        Returns:
        index of cc in s1
      • wsubstr

        static boolean wsubstr​(java.lang.String s1,
                               java.lang.String s2)
        Same as wsubstrn, but without a specified length.
        Parameters:
        s1 - full String
        s2 - String to search for
        Returns:
        true if s2 is found in s2 (case insensitive search)
      • isxdigit

        static boolean isxdigit​(char c)
        Is the character a hex digit?
        Parameters:
        c - char
        Returns:
        true if he given character is a hex digit
      • isInValuesIgnoreCase

        static boolean isInValuesIgnoreCase​(java.lang.String[] validValues,
                                            java.lang.String valueToCheck)
        Check if the string valueToCheck is contained in validValues array (case insesitie comparison).
        Parameters:
        validValues - array of valid values
        valueToCheck - value to search for
        Returns:
        true if valueToCheck is found in validValues
      • findBadSubString

        public static boolean findBadSubString​(java.lang.String s,
                                               java.lang.String p,
                                               int len)
        Return true if substring s is in p and isn't all in upper case. This is used to check the case of SYSTEM, PUBLIC, DTD and EN.
        Parameters:
        s - substring
        p - full string
        len - how many chars to check in p
        Returns:
        true if substring s is in p and isn't all in upper case
      • isXMLLetter

        static boolean isXMLLetter​(char c)
        Is the given char a valid xml letter?
        Parameters:
        c - char
        Returns:
        true if the char is a valid xml letter
      • isXMLNamechar

        static boolean isXMLNamechar​(char c)
        Is the given char valid in xml name?
        Parameters:
        c - char
        Returns:
        true if the char is a valid xml name char
      • isQuote

        static boolean isQuote​(int c)
        Is the given character a single or double quote?
        Parameters:
        c - char
        Returns:
        true if c is " or '
      • getBytes

        public static byte[] getBytes​(java.lang.String str)
        Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.
        Parameters:
        str - String
        Returns:
        utf8 bytes
        See Also:
        String.getBytes()
      • getString

        public static java.lang.String getString​(byte[] bytes,
                                                 int offset,
                                                 int length)
        Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.
        Parameters:
        bytes - byte array
        offset - starting offset in byte array
        length - length in byte array starting from offset
        Returns:
        same as new String(bytes, offset, length, "UTF8")
      • lastChar

        public static int lastChar​(java.lang.String str)
        Return the last char in string. This is useful when trailing quotemark is missing on an attribute
        Parameters:
        str - String
        Returns:
        last char in String
      • isWhite

        public static boolean isWhite​(char c)
        Determines if the specified character is whitespace.
        Parameters:
        c - char
        Returns:
        true if char is whitespace.
      • isDigit

        public static boolean isDigit​(char c)
        Is the given char a digit?
        Parameters:
        c - char
        Returns:
        true if the given char is a digit
      • isLetter

        public static boolean isLetter​(char c)
        Is the given char a letter?
        Parameters:
        c - char
        Returns:
        true if the given char is a letter
      • isNamechar

        public static boolean isNamechar​(char c)
        Is the given char valid in name? (letter, digit or "-", ".", ":", "_")
        Parameters:
        c - char
        Returns:
        true if char is a name char.
      • isLower

        public static boolean isLower​(char c)
        Determines if the specified character is a lowercase character.
        Parameters:
        c - char
        Returns:
        true if char is lower case.
      • isUpper

        public static boolean isUpper​(char c)
        Determines if the specified character is a uppercase character.
        Parameters:
        c - char
        Returns:
        true if char is upper case.
      • toLower

        public static char toLower​(char c)
        Maps the given character to its lowercase equivalent.
        Parameters:
        c - char
        Returns:
        lowercase char.
      • toUpper

        public static char toUpper​(char c)
        Maps the given character to its uppercase equivalent.
        Parameters:
        c - char
        Returns:
        uppercase char.
      • foldCase

        public static char foldCase​(char c,
                                    boolean tocaps,
                                    boolean xmlTags)
        Fold case of a char.
        Parameters:
        c - char
        tocaps - convert to caps
        xmlTags - use xml tags? If true no change will be performed
        Returns:
        folded char
      • mapStr

        private static void mapStr​(java.lang.String str,
                                   short code)
        Classify chars in String and put them in lexmap.
        Parameters:
        str - String
        code - code associated to chars in the String
      • map

        private static short map​(char c)
        Returns the constant which defines the classification of char in lexmap.
        Parameters:
        c - char
        Returns:
        char type
      • isCharEncodingSupported

        public static boolean isCharEncodingSupported​(java.lang.String name)
        Is the given character encoding supported?
        Parameters:
        name - character encoding name
        Returns:
        true if encoding is supported, false otherwhise.