06374210 is referenced by 69 patents.

A system

100

is capable of segmenting a connected text, such as Japanese or Chinese sentence, into words. The system includes means

110

for reading an input string representing the connected text. Segmentation means

120

identifies at least one word sequence in the connected text by building a tree structure representing word sequence(s) in the input string in an iterative manner. Initially the input string is taken as a working string. Each word of a dictionary

122

is compared with the beginning of the working string. A match is represented by a node in the tree, and the process is continued with the remaining part of the input string. The system further includes means

130

for outputting at least one of the identified word sequences. A language model may be used to select between candidate sequences. Preferably the system is used in a speech recognition system to update the lexicon based on representative texts.

Title
Automatic segmentation of a text
Application Number
9/449231
Publication Number
6374210 (B1)
Application Date
November 24, 1999
Publication Date
April 16, 2002
Inventor
Ya Cherng Chu
Taipei
US
Agent
Daniel J Piotrowski
US
Assignee
U S Philips Corporation
NY, US
IPC
G06F 17/27
View Original Source