home edit page issue tracker

This page pertains to UD version 2.

UD for Bavarian

We mostly follow the German guidelines but describe differences to those guidelines here.

Tokenization and Word Segmentation

We mostly delimit based on whitespace and punctuation.

Multi-word tokens

We treat fused prepositions and determiners as multi-word tokens (following the German UD annotation guidelines). Since there is both phonetic and orthographic variation in the forms of the determiners, we do not normalize them, and instead simply split the tokens into substrings (even if this occasionally results in slightly awkward tokenizations, see the second example):

When zum (zun, zan, …) is used in infinitive constructions (Ludwig van Beethoven hod de Gwohnheit ghobt, genau 60 Kafääbaunan zum oozöön […] “Ludwig van Beethoven had a habit of counting exactly 60 coffee beans”; sentence via the Wikipedia article Kafää), we handle it similarly (although note the different parts of speech):

Tokens split with SpaceAfter=No

We split off shortened determiners and adpositions in noun phrases, but use the SpaceAfter=No MISC attribute:

In sentences where a verb or conjunction is immediately followed by one or more pronouns, we use SpaceAfter=No to split them:

Tokens NOT split apart

Morphology

Tags

Below is a copy of the German guidelines, adjusted for Bavarian:

Features

There are no feature-related guidelines for Bavarian at the moment.

Syntax

Below is a copy of the German guidelines, adjusted for Bavarian:

Core Arguments, Oblique Arguments and Adjuncts

Non-verbal Clauses

Relations Overview

Specific annotation decisions for Bavarian

Postponed adjectives

When an adjective is postponed for emphasis, we consider it an apposition (e.g., da freche Bua “the cheeky boy” rearranged to da Bua, da freche):

da Bua , da freche \n the boy , the cheeky
appos(Bua, freche)
det(Bua, da-1)
det(freche, da-4)
punct(freche, ,-3)
appos(boy, cheeky)
det(boy, the-7)
det(cheeky, the-10)
punct(cheeky, ,-9)

Infinitives with z(u)

Many infinitive constructions either use the cliticized infinitive marker z followed by a verbal infinitive or the marker zu (za) combined with a clitized dative determiner m (n) and a nominalized infinitive. We annotate such cases as follows, re-using the example sentence from above: Ludwig van Beethoven hod de Gwohnheit ghobt, genau 60 Kafääbaunan zum oozöön, um si draus a Schalal Mokka zmochn. “Ludwig van Beethoven had a habit of counting exactly 60 coffee beans in order to brew a cup of coffee from them”

[...] genau 60 Kafääbaunan zu/PART m/DET oozöön/NOUN , um si draus a Schalal Mokka z/PART mochn/VERB  \n [...] exactly 60 coffee.beans to/PART the/DET counting/NOUN , in.order.to himself out.of.it a cup coffee to/PART make/VERB
mark(oozöön, zu)
det(oozöön, m)
mark(mochn, z)
mark(counting, to)
det(counting, the)
mark(make, to-32)

Relative markers

We tag relative pronouns (dea, de, des) as PRON and annotate them as nsubj/obj/… For the relative marker wo (wos, wej), we use SCONJ and mark:

da Disch , (dea)/PRON wo/SCONJ hier steht
nsubj(steht, (dea))
mark(steht, wo)
acl:relcl(Disch, steht)

Treebanks

There is one Bavarian UD treebank: