edu.rice.cs.plt.text
Class TextUtil

java.lang.Object
  extended by edu.rice.cs.plt.text.TextUtil

public final class TextUtil
extends Object


Nested Class Summary
static class TextUtil.SplitString
          The result of a split() invocation.
 
Field Summary
static String NEWLINE
          The system-dependent "line.separator" property.
static String NEWLINE_PATTERN
          A regex matching any line break: \r\n, \n, or \r.
 
Method Summary
static boolean contains(String s, int character)
          Determine if the given character occurs in s.
static boolean contains(String s, String piece)
          Determine if the given string occurs in s.
static boolean containsAll(String s, int... characters)
          Determine if all of the given characters occur in s.
static boolean containsAll(String s, String... pieces)
          Determine if all of the given strings occur in s.
static boolean containsAllIgnoreCase(String s, String... pieces)
          Determine if all of the given strings occur in s, ignoring differences in case.
static boolean containsAny(String s, int... characters)
          Determine if any of the given characters occurs in s.
static boolean containsAny(String s, String... pieces)
          Determine if any of the given strings occurs in s.
static boolean containsAnyIgnoreCase(String s, String... pieces)
          Determine if any of the given strings occurs in s, ignoring differences in case.
static boolean containsIgnoreCase(String s, String piece)
          Determine if the given string occurs in s, ignoring differences in case.
static boolean endsWithAny(String s, String... suffixes)
          Determine if any of the given strings is a suffix of s.
static SizedIterable<String> getLines(String s)
          Break a string into a list of lines.
static String htmlEscape(String s)
          Convert the given string to an escaped form compatible with HTML.
static String htmlUnescape(String s)
          Interpret all HTML character entities in the given string.
static int indexOfFirst(String s, int... characters)
          Find the first occurrence of any of the given characters in s.
static int indexOfFirst(String s, String... pieces)
          Find the first occurrence of any of the given strings in s.
static boolean isDecimalDigit(char c)
           
static boolean isHexDigit(char c)
           
static boolean isOctalDigit(char c)
           
static String javaEscape(String s)
          Convert the given string to a form compatible with the Java language specification for character and string literals (see JLS 3.10.6).
static String javaUnescape(String s)
          Convert a string potentially containing Java character escapes (as in javaEscape(java.lang.String)) to its unescaped equivalent.
static String padLeft(String s, char c, int length)
          Create a string of (at least) the given length by filling in copies of c to the left of s.
static String padRight(String s, char c, int length)
          Create a string of (at least) the given length by filling in copies of c to the right of s.
static String prefix(String s, int delim)
          Extract the portion of s before the first occurrence of the given delimiter.
static String regexEscape(String s)
          Produce a regular expression that matches the given string.
static String removePrefix(String s, int delim)
          Extract the portion of s after the first occurrence of the given delimiter.
static String removeSuffix(String s, int delim)
          Extract the portion of s before the last occurrence of the given delimiter.
static String repeat(char c, int copies)
          Produce a string by concatenating copies instances of c
static String repeat(String s, int copies)
          Produce a string by concatenating copies instances of s
static String sgmlEscape(String s, Map<Character,String> entities, boolean convertToAscii)
          Convert the given string to a form containing SGML character entities.
static String sgmlUnescape(String s, Map<String,Character> entities)
          Interpret all SGML character entities in the given string according to the provided name-character mapping.
static TextUtil.SplitString split(String s, String delimRegex, Bracket... brackets)
          An extended version of split(java.lang.String, java.lang.String, edu.rice.cs.plt.text.Bracket...) that recognizes nested matched brackets and only splits where the delimiter occurs at the top level.
static TextUtil.SplitString split(String s, String delimRegex, int limit, Bracket... brackets)
          An extended version of split(java.lang.String, java.lang.String, edu.rice.cs.plt.text.Bracket...) that recognizes nested matched brackets and only splits where the delimiter occurs at the top level.
static TextUtil.SplitString splitWithParens(String s, String delimRegex)
          An extended version of split(java.lang.String, java.lang.String, edu.rice.cs.plt.text.Bracket...) that recognizes nested parentheses and only splits where the delimiter occurs at the top level.
static TextUtil.SplitString splitWithParens(String s, String delimRegex, int limit)
          An extended version of split(java.lang.String, java.lang.String, edu.rice.cs.plt.text.Bracket...) that recognizes nested parentheses and only splits where the delimiter occurs at the top level.
static boolean startsWithAny(String s, String... prefixes)
          Determine if any of the given strings is a prefix of s.
static String suffix(String s, int delim)
          Extract the portion of s after the last occurrence of the given delimiter.
static String toHexString(byte[] bs)
          Express a byte array as a sequence of unsigned hexadecimal bytes.
static String toHexString(byte[] bs, int offset, int length)
          Express a byte array as a sequence of unsigned hexadecimal bytes.
static String toString(Object o)
          Convert the given object to a string.
static String unicodeEscape(String s)
          Convert all non-ASCII characters in the string to Unicode escapes, as specified by JLS 3.3.
static String unicodeUnescape(String s)
          Convert all Unicode escapes in the string into their equivalent Unicode characters, as specified by JLS 3.3.
static String unicodeUnescapeOnce(String s)
          Convert all one-level Unicode escapes in the string to their equivalent characters, as specified by JLS 3.3.
static String xmlEscape(String s)
          Convert the given string to an escaped form compatible with XML.
static String xmlEscape(String s, boolean convertToAscii)
          Convert the given string to an escaped form compatible with XML.
static String xmlUnescape(String s)
          Interpret all XML character entities in the given string.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NEWLINE

public static final String NEWLINE
The system-dependent "line.separator" property.


NEWLINE_PATTERN

public static final String NEWLINE_PATTERN
A regex matching any line break: \r\n, \n, or \r.

See Also:
Constant Field Values
Method Detail

toString

public static String toString(Object o)
Convert the given object to a string. This method invokes RecurUtil.safeToString(Object) to provide simple, safe handling of null values, arrays, and self-referential data structures (with cooperation from the toString() method of the relevant class).


getLines

public static SizedIterable<String> getLines(String s)
Break a string into a list of lines. "\n", "\r", and "\r\n" are considered line delimiters. The empty string is taken to contain 0 lines. An optional final trailing newline will be ignored.


repeat

public static String repeat(String s,
                            int copies)
Produce a string by concatenating copies instances of s


repeat

public static String repeat(char c,
                            int copies)
Produce a string by concatenating copies instances of c


padLeft

public static String padLeft(String s,
                             char c,
                             int length)
Create a string of (at least) the given length by filling in copies of c to the left of s.


padRight

public static String padRight(String s,
                              char c,
                              int length)
Create a string of (at least) the given length by filling in copies of c to the right of s.


contains

public static boolean contains(String s,
                               int character)
Determine if the given character occurs in s. Defined in terms of String.indexOf(int).


contains

public static boolean contains(String s,
                               String piece)
Determine if the given string occurs in s. Defined in terms of String.indexOf(String). This is also defined as contains(java.lang.String, int), but is defined here for legacy support.


containsAny

public static boolean containsAny(String s,
                                  int... characters)
Determine if any of the given characters occurs in s. Defined in terms of String.indexOf(int).


containsAny

public static boolean containsAny(String s,
                                  String... pieces)
Determine if any of the given strings occurs in s. Defined in terms of String.indexOf(String).


containsAll

public static boolean containsAll(String s,
                                  int... characters)
Determine if all of the given characters occur in s. Defined in terms of String.indexOf(int).


containsAll

public static boolean containsAll(String s,
                                  String... pieces)
Determine if all of the given strings occur in s. Defined in terms of String.indexOf(String).


containsIgnoreCase

public static boolean containsIgnoreCase(String s,
                                         String piece)
Determine if the given string occurs in s, ignoring differences in case. Unlike String.equalsIgnoreCase(java.lang.String), this test only compares the lower-case conversion of s to the lower-case conversion of piece.


containsAnyIgnoreCase

public static boolean containsAnyIgnoreCase(String s,
                                            String... pieces)
Determine if any of the given strings occurs in s, ignoring differences in case. Defined in terms of containsIgnoreCase(java.lang.String, java.lang.String).


containsAllIgnoreCase

public static boolean containsAllIgnoreCase(String s,
                                            String... pieces)
Determine if all of the given strings occur in s, ignoring differences in case. Defined in terms of containsIgnoreCase(java.lang.String, java.lang.String).


startsWithAny

public static boolean startsWithAny(String s,
                                    String... prefixes)
Determine if any of the given strings is a prefix of s. Defined in terms of String.startsWith(java.lang.String, int).


endsWithAny

public static boolean endsWithAny(String s,
                                  String... suffixes)
Determine if any of the given strings is a suffix of s. Defined in terms of String.endsWith(java.lang.String).


indexOfFirst

public static int indexOfFirst(String s,
                               int... characters)
Find the first occurrence of any of the given characters in s. If none are present, the result is -1. Defined in terms of String.indexOf(int).


indexOfFirst

public static int indexOfFirst(String s,
                               String... pieces)
Find the first occurrence of any of the given strings in s. If none are present, the result is -1. Defined in terms of String.indexOf(String).


prefix

public static String prefix(String s,
                            int delim)
Extract the portion of s before the first occurrence of the given delimiter. s if the delimiter is not found.


removePrefix

public static String removePrefix(String s,
                                  int delim)
Extract the portion of s after the first occurrence of the given delimiter. s if the delimiter is not found.


suffix

public static String suffix(String s,
                            int delim)
Extract the portion of s after the last occurrence of the given delimiter. s if the delimiter is not found.


removeSuffix

public static String removeSuffix(String s,
                                  int delim)
Extract the portion of s before the last occurrence of the given delimiter. s if the delimiter is not found.


splitWithParens

public static TextUtil.SplitString splitWithParens(String s,
                                                   String delimRegex)
An extended version of split(java.lang.String, java.lang.String, edu.rice.cs.plt.text.Bracket...) that recognizes nested parentheses and only splits where the delimiter occurs at the top level. This convenience method sets limit to 0 (unlimited number of matches) and brackets to Bracket.PARENTHESES. See split(String, String, int, Bracket[]) for a full specification.


splitWithParens

public static TextUtil.SplitString splitWithParens(String s,
                                                   String delimRegex,
                                                   int limit)
An extended version of split(java.lang.String, java.lang.String, edu.rice.cs.plt.text.Bracket...) that recognizes nested parentheses and only splits where the delimiter occurs at the top level. This convenience method sets brackets to Bracket.PARENTHESES. See split(String, String, int, Bracket[]) for a full specification.


split

public static TextUtil.SplitString split(String s,
                                         String delimRegex,
                                         Bracket... brackets)
An extended version of split(java.lang.String, java.lang.String, edu.rice.cs.plt.text.Bracket...) that recognizes nested matched brackets and only splits where the delimiter occurs at the top level. This convenience method sets limit to 0 (unlimited number of matches). See split(String, String, int, Bracket[]) for a full specification.


split

public static TextUtil.SplitString split(String s,
                                         String delimRegex,
                                         int limit,
                                         Bracket... brackets)
An extended version of split(java.lang.String, java.lang.String, edu.rice.cs.plt.text.Bracket...) that recognizes nested matched brackets and only splits where the delimiter occurs at the top level. For convenience when the delimiter is a nontrivial regular expression, the result includes both the split strings and the matched delimiters. Ignoring these extensions, the behavior is roughly equivalent: s.split(delimRegex, limit) is equivalent to TextUtil.split(s, delimRegex, limit).array(), with the exception that trailing empty strings (separated by delimiters) are never discarded here.

Parameters:
s - A string to split
delimRegex - A regular expression recognizing delimiters
limit - The number of non-delimiter pieces to produce. Consistent with String.split(), limit-1 is the number of delimiters to search for. If 0 or negative, the search continues until the string is exhausted. Unlike String.split(), trailing empty strings (separated by delimiters) are never discarded, even when limit == 0.
brackets - Bracket pairs that should be recognized. A delimiter match that occurs within one of these bracket pairs (at any nonzero nesting depth) is not considered a delimiter. A left bracket increases the nesting level only if it is at the top level or follows another left bracket that supports nesting; a right bracket reduces the nesting level only if it matches the most recent left bracket. If delimRegex recognizes part of a valid bracket (e.g., "*" is the delimiter and "/*" is a bracket), how relevant text is handled is unspecified (it would be nice, but difficult, to fix this). If multiple brackets overlap, an expected right bracket will match before a left bracket, and the first left bracket listed in brackets has priority over later left brackets.

toHexString

public static String toHexString(byte[] bs)
Express a byte array as a sequence of unsigned hexadecimal bytes.


toHexString

public static String toHexString(byte[] bs,
                                 int offset,
                                 int length)
Express a byte array as a sequence of unsigned hexadecimal bytes.


isDecimalDigit

public static boolean isDecimalDigit(char c)

isOctalDigit

public static boolean isOctalDigit(char c)

isHexDigit

public static boolean isHexDigit(char c)

unicodeEscape

public static String unicodeEscape(String s)
Convert all non-ASCII characters in the string to Unicode escapes, as specified by JLS 3.3. As suggested by JLS, an additional u is added to existing escapes in the string; instances of \ that precede a non-ASCII character or a malformed Unicode escape will be encoded as &#92;u005c. The original string may be safely reconstructed with unicodeUnescapeOnce(java.lang.String); to safely interpret all Unicode escapes, including those in the original string, use unicodeUnescape(java.lang.String) (in either case, this method guarantees an absence of IllegalArgumentExceptions).


unicodeUnescapeOnce

public static String unicodeUnescapeOnce(String s)
Convert all one-level Unicode escapes in the string to their equivalent characters, as specified by JLS 3.3. Higher-level escapes (containing multiple 'u' characters) will have a single 'u' removed.

Throws:
IllegalArgumentException - If a backslash-u escape in the string is not followed by 4 hex digits

unicodeUnescape

public static String unicodeUnescape(String s)
Convert all Unicode escapes in the string into their equivalent Unicode characters, as specified by JLS 3.3.

Throws:
IllegalArgumentException - If a backslash-u escape in the string is not followed by 4 hex digits

javaEscape

public static String javaEscape(String s)
Convert the given string to a form compatible with the Java language specification for character and string literals (see JLS 3.10.6). The characters \, ", and ' are replaced with escape sequences. All control characters between &#92;u0000 and &#92;u001F, along with &#92;u007F, are replaced with mnemonic escape sequences (such as "\n"), or octal escape sequences if no mnemonic exists.


javaUnescape

public static String javaUnescape(String s)
Convert a string potentially containing Java character escapes (as in javaEscape(java.lang.String)) to its unescaped equivalent. Note that Unicode escapes are not interpreted (strings from Java source code should first be processed by unicodeUnescape(java.lang.String)).

Throws:
IllegalArgumentException - If the character \ is followed by an invalid escape character or the end of the string.

regexEscape

public static String regexEscape(String s)

Produce a regular expression that matches the given string. Backslash escape sequences are used for all characters that potentially clash with regular expression syntax. For simplicity, escapes are applied to all control characters (&#92;u0000 to &#92;u001F and &#92;u007F) and to all non-alphanumeric, non-space ASCII characters (in the range &#92;u0020 to &#92;u007E), including those that have no special meaning in the regular expression syntax (such as @, ", and ~). Where a mnemonic escape for control characters exists, it is used; otherwise, the hexadecimal \xhh notation is used.

Note: a similar method is available in Java 5: Pattern.quote(java.lang.String). It has the same basic contract — produce a regex to match the given string — but produces different (equivalent) results.


sgmlEscape

public static String sgmlEscape(String s,
                                Map<Character,String> entities,
                                boolean convertToAscii)
Convert the given string to a form containing SGML character entities. All characters appearing in entities will be translated to their corrresponding entity names; if convertToAscii is true, all other non-ASCII characters will be converted to numeric references.


sgmlUnescape

public static String sgmlUnescape(String s,
                                  Map<String,Character> entities)
Interpret all SGML character entities in the given string according to the provided name-character mapping.

Throws:
IllegalArgumentException - If the string contains a malformed or unrecognized character entity

xmlEscape

public static String xmlEscape(String s)
Convert the given string to an escaped form compatible with XML. The standard XML named entities (", &, ', <, and >) will be replaced with named references (such as &quot;), and all non-ASCII characters will be replaced with numeric references.


xmlEscape

public static String xmlEscape(String s,
                               boolean convertToAscii)
Convert the given string to an escaped form compatible with XML. The standard XML named entities (", &, ', <, and >) will be replaced with named references (such as &quot;); if convertToAscii is true, all non-ASCII characters will be replaced with numeric references.


xmlUnescape

public static String xmlUnescape(String s)
Interpret all XML character entities in the given string.

Throws:
IllegalArgumentException - If the string contains a malformed or unrecognized character entity

htmlEscape

public static String htmlEscape(String s)
Convert the given string to an escaped form compatible with HTML. All named entities supported by HTML 4.0 will be replaced with named references, and all other non-ASCII characters will be replaced with numeric references. The ' character will also be replaced with a numeric refererence.


htmlUnescape

public static String htmlUnescape(String s)
Interpret all HTML character entities in the given string.

Throws:
IllegalArgumentException - If the string contains a malformed or unrecognized character entity