home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Chinese-PatentChar: POS Tags: PUNCT

There are 1 PUNCT lemmas (7%), 15 PUNCT types (2%) and 560 PUNCT tokens (12%). Out of 15 observed tags, the rank of PUNCT is: 12 in number of lemmas, 8 in number of types and 3 in number of tokens.

The 10 most frequent PUNCT lemmas: _

The 10 most frequent PUNCT types: ,、 ;、 。、 :、 、、 1.、 2.、 /、 1.1、 1.2

The 10 most frequent ambiguous lemmas: _ (NOUN 1661, VERB 948, PUNCT 560, ADJ 474, PART 346, ADP 259, NUM 185, CCONJ 106, ADV 68, PROPN 60, PRON 48, DET 39, X 14, SCONJ 10, AUX 6)

The 10 most frequent ambiguous types: 1.3 (NUM 1, PUNCT 1), 2.1 (NUM 1, PUNCT 1), 2.2 (NUM 1, PUNCT 1)

Morphology

The form / lemma ratio of PUNCT is 15.000000 (the average of all parts of speech is 50.400000).

The 1st highest number of forms (15) was observed with the lemma “_”: /, 1., 1.1, 1.2, 1.3, 2., 2.1, 2.2, 2.3, :, 、, 。, ,, :, ;.

PUNCT does not occur with any features.

Relations

PUNCT nodes are attached to their parents using 1 different relations: punct (560; 100% instances)

Parents of PUNCT nodes belong to 10 different parts of speech: VERB (381; 68% instances), NOUN (152; 27% instances), ADJ (7; 1% instances), ADP (7; 1% instances), ADV (5; 1% instances), PROPN (4; 1% instances), AUX (1; 0% instances), CCONJ (1; 0% instances), DET (1; 0% instances), PART (1; 0% instances)

560 (100%) PUNCT nodes are leaves.

The highest child degree of a PUNCT node is 0.