Provides a language independent way to break UNICODE text into meaningful semantic units (e.g. words).
This interface is implemented by the following components:
Next() Get the begin / end offset of the next unit in the current text
- text: the text to be scanned
- length: the number of characters in the text to be processed
- pos: the current position
- isLastBuffer: , the buffer is the last one
- begin: the end offset of the next unit
- has more unit in the current text
Starts up the semantic unit scanner with an optional character set, which acts as a hint to optimize the heuristics used to determine the language(s) of the processed text.
- characterSet: the character set the text was originally encoded in (can be NULL)
Reference documentation is generated from Mozilla's source.