home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Chinese-Beginner: POS Tags: NUM

There are 32 NUM lemmas (2%), 32 NUM types (2%) and 640 NUM tokens (3%). Out of 15 observed tags, the rank of NUM is: 7 in number of lemmas, 7 in number of types and 9 in number of tokens.

The 10 most frequent NUM lemmas: 一、 两、 十、 三、 几、 五、 1、 二、 八、 四

The 10 most frequent NUM types: 一、 两、 十、 三、 几、 五、 1、 二、 八、 四

The 10 most frequent ambiguous lemmas: 一 (NUM 170, ADV 2, DET 2), 几 (NUM 41, DET 1, PRON 1), 二 (NUM 23, NOUN 1), 半 (NOUN 28, NUM 11), 第 (NUM 9, NOUN 3), 零 (NUM 5, ADV 1), 多 (ADJ 86, ADV 40, NUM 4), 双 (NOUN 2, NUM 1)

The 10 most frequent ambiguous types: 一 (NUM 170, ADV 2, DET 2), 几 (NUM 41, DET 1, PRON 1), 二 (NUM 23, NOUN 1), 半 (NOUN 28, NUM 11), 第 (NUM 9, NOUN 3), 零 (NUM 5, ADV 1), 多 (ADJ 86, ADV 40, NUM 4), 双 (NOUN 2, NUM 1)

Morphology

The form / lemma ratio of NUM is 1.000000 (the average of all parts of speech is 1.000000).

The 1st highest number of forms (1) was observed with the lemma “0”: 0.

The 2nd highest number of forms (1) was observed with the lemma “1”: 1.

The 3rd highest number of forms (1) was observed with the lemma “2”: 2.

NUM occurs with 1 features: NumType (599; 94% instances)

NUM occurs with 2 feature-value pairs: NumType=Card, NumType=Ord

NUM occurs with 3 feature combinations. The most frequent feature combination is NumType=Card (592 tokens). Examples: 一、 两、 十、 三、 几、 五、 1、 二、 八、 四

Relations

NUM nodes are attached to their parents using 8 different relations: nummod (474; 74% instances), flat (106; 17% instances), dep (23; 4% instances), nmod (12; 2% instances), conj (10; 2% instances), obj (9; 1% instances), obl (3; 0% instances), root (3; 0% instances)

Parents of NUM nodes belong to 4 different parts of speech: NOUN (486; 76% instances), NUM (139; 22% instances), VERB (12; 2% instances), (3; 0% instances)

501 (78%) NUM nodes are leaves.

106 (17%) NUM nodes have one child.

17 (3%) NUM nodes have two children.

16 (3%) NUM nodes have three or more children.

The highest child degree of a NUM node is 7.

Children of NUM nodes are attached using 16 different relations: flat (106; 53% instances), nummod (16; 8% instances), advmod (15; 7% instances), nmod (15; 7% instances), det (11; 5% instances), conj (10; 5% instances), punct (6; 3% instances), advcl (4; 2% instances), nsubj (4; 2% instances), cc (3; 1% instances), cop (3; 1% instances), obl (3; 1% instances), amod (2; 1% instances), case (1; 0% instances), discourse (1; 0% instances), parataxis (1; 0% instances)

Children of NUM nodes belong to 11 different parts of speech: NUM (139; 69% instances), ADV (15; 7% instances), NOUN (12; 6% instances), DET (11; 5% instances), ADJ (7; 3% instances), PUNCT (6; 3% instances), AUX (3; 1% instances), CCONJ (3; 1% instances), PART (2; 1% instances), PRON (2; 1% instances), VERB (1; 0% instances)