Package org.w3c.tidy
Class TidyUtils
- java.lang.Object
-
- org.w3c.tidy.TidyUtils
-
public final class TidyUtils extends java.lang.Object
Utility class with handy methods, mainly for String handling or for reproducing c behaviours.- Version:
- $Revision $ ($Author $)
-
-
Field Summary
Fields Modifier and Type Field Description private static short
DIGIT
char type: digit.private static short
LETTER
char type: letter.private static short[]
lexmap
used to classify chars for lexical purposes.private static short
LOWERCASE
char type: lowercase.private static short
NAMECHAR
char type: namechar.private static short
NEWLINE
char type: newline.private static short
UPPERCASE
char type: uppercase.private static short
WHITE
char type: whitespace.
-
Constructor Summary
Constructors Modifier Constructor Description private
TidyUtils()
utility class, don't instantiate.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static boolean
findBadSubString(java.lang.String s, java.lang.String p, int len)
Return true if substring s is in p and isn't all in upper case.static char
foldCase(char c, boolean tocaps, boolean xmlTags)
Fold case of a char.static byte[]
getBytes(java.lang.String str)
Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.static java.lang.String
getString(byte[] bytes, int offset, int length)
Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.static boolean
isCharEncodingSupported(java.lang.String name)
Is the given character encoding supported?static boolean
isDigit(char c)
Is the given char a digit?(package private) static boolean
isInValuesIgnoreCase(java.lang.String[] validValues, java.lang.String valueToCheck)
Check if the string valueToCheck is contained in validValues array (case insesitie comparison).static boolean
isLetter(char c)
Is the given char a letter?static boolean
isLower(char c)
Determines if the specified character is a lowercase character.static boolean
isNamechar(char c)
Is the given char valid in name? (letter, digit or "-", ".", ":", "_")(package private) static boolean
isQuote(int c)
Is the given character a single or double quote?static boolean
isUpper(char c)
Determines if the specified character is a uppercase character.static boolean
isWhite(char c)
Determines if the specified character is whitespace.(package private) static boolean
isxdigit(char c)
Is the character a hex digit?(package private) static boolean
isXMLLetter(char c)
Is the given char a valid xml letter?(package private) static boolean
isXMLNamechar(char c)
Is the given char valid in xml name?static int
lastChar(java.lang.String str)
Return the last char in string.private static short
map(char c)
Returns the constant which defines the classification of char in lexmap.private static void
mapStr(java.lang.String str, short code)
Classify chars in String and put them in lexmap.(package private) static boolean
toBoolean(int value)
Converts a int to a boolean.static char
toLower(char c)
Maps the given character to its lowercase equivalent.(package private) static int
toUnsigned(int c)
convert an int to unsigned (& 0xFF).static char
toUpper(char c)
Maps the given character to its uppercase equivalent.(package private) static int
wstrnchr(java.lang.String s1, int len1, char cc)
return offset of cc from beginning of s1, -1 if not found.(package private) static boolean
wsubstr(java.lang.String s1, java.lang.String s2)
Same as wsubstrn, but without a specified length.(package private) static boolean
wsubstrn(java.lang.String s1, int len1, java.lang.String s2)
check if the first String contains the second one.(package private) static boolean
wsubstrncase(java.lang.String s1, int len1, java.lang.String s2)
check if the first String contains the second one (ignore case).
-
-
-
Field Detail
-
DIGIT
private static final short DIGIT
char type: digit.- See Also:
- Constant Field Values
-
LETTER
private static final short LETTER
char type: letter.- See Also:
- Constant Field Values
-
NAMECHAR
private static final short NAMECHAR
char type: namechar.- See Also:
- Constant Field Values
-
WHITE
private static final short WHITE
char type: whitespace.- See Also:
- Constant Field Values
-
NEWLINE
private static final short NEWLINE
char type: newline.- See Also:
- Constant Field Values
-
LOWERCASE
private static final short LOWERCASE
char type: lowercase.- See Also:
- Constant Field Values
-
UPPERCASE
private static final short UPPERCASE
char type: uppercase.- See Also:
- Constant Field Values
-
lexmap
private static short[] lexmap
used to classify chars for lexical purposes.
-
-
Method Detail
-
toBoolean
static boolean toBoolean(int value)
Converts a int to a boolean.- Parameters:
value
- int value- Returns:
true
if value is != 0
-
toUnsigned
static int toUnsigned(int c)
convert an int to unsigned (& 0xFF).- Parameters:
c
- signed int- Returns:
- unsigned int
-
wsubstrn
static boolean wsubstrn(java.lang.String s1, int len1, java.lang.String s2)
check if the first String contains the second one.- Parameters:
s1
- full Stringlen1
- maximum position in Strings2
- String to search for- Returns:
- true if s1 contains s2 in the range 0-len1
-
wsubstrncase
static boolean wsubstrncase(java.lang.String s1, int len1, java.lang.String s2)
check if the first String contains the second one (ignore case).- Parameters:
s1
- full Stringlen1
- maximum position in Strings2
- String to search for- Returns:
- true if s1 contains s2 in the range 0-len1
-
wstrnchr
static int wstrnchr(java.lang.String s1, int len1, char cc)
return offset of cc from beginning of s1, -1 if not found.- Parameters:
s1
- Stringlen1
- maximum offset (values > than lenl are ignored and returned as -1)cc
- character to search for- Returns:
- index of cc in s1
-
wsubstr
static boolean wsubstr(java.lang.String s1, java.lang.String s2)
Same as wsubstrn, but without a specified length.- Parameters:
s1
- full Strings2
- String to search for- Returns:
true
if s2 is found in s2 (case insensitive search)
-
isxdigit
static boolean isxdigit(char c)
Is the character a hex digit?- Parameters:
c
- char- Returns:
true
if he given character is a hex digit
-
isInValuesIgnoreCase
static boolean isInValuesIgnoreCase(java.lang.String[] validValues, java.lang.String valueToCheck)
Check if the string valueToCheck is contained in validValues array (case insesitie comparison).- Parameters:
validValues
- array of valid valuesvalueToCheck
- value to search for- Returns:
true
if valueToCheck is found in validValues
-
findBadSubString
public static boolean findBadSubString(java.lang.String s, java.lang.String p, int len)
Return true if substring s is in p and isn't all in upper case. This is used to check the case of SYSTEM, PUBLIC, DTD and EN.- Parameters:
s
- substringp
- full stringlen
- how many chars to check in p- Returns:
- true if substring s is in p and isn't all in upper case
-
isXMLLetter
static boolean isXMLLetter(char c)
Is the given char a valid xml letter?- Parameters:
c
- char- Returns:
true
if the char is a valid xml letter
-
isXMLNamechar
static boolean isXMLNamechar(char c)
Is the given char valid in xml name?- Parameters:
c
- char- Returns:
true
if the char is a valid xml name char
-
isQuote
static boolean isQuote(int c)
Is the given character a single or double quote?- Parameters:
c
- char- Returns:
true
if c is " or '
-
getBytes
public static byte[] getBytes(java.lang.String str)
Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.- Parameters:
str
- String- Returns:
- utf8 bytes
- See Also:
String.getBytes()
-
getString
public static java.lang.String getString(byte[] bytes, int offset, int length)
Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.- Parameters:
bytes
- byte arrayoffset
- starting offset in byte arraylength
- length in byte array starting from offset- Returns:
- same as
new String(bytes, offset, length, "UTF8")
-
lastChar
public static int lastChar(java.lang.String str)
Return the last char in string. This is useful when trailing quotemark is missing on an attribute- Parameters:
str
- String- Returns:
- last char in String
-
isWhite
public static boolean isWhite(char c)
Determines if the specified character is whitespace.- Parameters:
c
- char- Returns:
true
if char is whitespace.
-
isDigit
public static boolean isDigit(char c)
Is the given char a digit?- Parameters:
c
- char- Returns:
true
if the given char is a digit
-
isLetter
public static boolean isLetter(char c)
Is the given char a letter?- Parameters:
c
- char- Returns:
true
if the given char is a letter
-
isNamechar
public static boolean isNamechar(char c)
Is the given char valid in name? (letter, digit or "-", ".", ":", "_")- Parameters:
c
- char- Returns:
true
if char is a name char.
-
isLower
public static boolean isLower(char c)
Determines if the specified character is a lowercase character.- Parameters:
c
- char- Returns:
true
if char is lower case.
-
isUpper
public static boolean isUpper(char c)
Determines if the specified character is a uppercase character.- Parameters:
c
- char- Returns:
true
if char is upper case.
-
toLower
public static char toLower(char c)
Maps the given character to its lowercase equivalent.- Parameters:
c
- char- Returns:
- lowercase char.
-
toUpper
public static char toUpper(char c)
Maps the given character to its uppercase equivalent.- Parameters:
c
- char- Returns:
- uppercase char.
-
foldCase
public static char foldCase(char c, boolean tocaps, boolean xmlTags)
Fold case of a char.- Parameters:
c
- chartocaps
- convert to capsxmlTags
- use xml tags? If true no change will be performed- Returns:
- folded char
-
mapStr
private static void mapStr(java.lang.String str, short code)
Classify chars in String and put them in lexmap.- Parameters:
str
- Stringcode
- code associated to chars in the String
-
map
private static short map(char c)
Returns the constant which defines the classification of char in lexmap.- Parameters:
c
- char- Returns:
- char type
-
isCharEncodingSupported
public static boolean isCharEncodingSupported(java.lang.String name)
Is the given character encoding supported?- Parameters:
name
- character encoding name- Returns:
true
if encoding is supported, false otherwhise.
-
-