home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

UD Old French PROFITEROLE

Language: Old French (code: fro)
Family: Indo-European, Romance

This treebank has been part of Universal Dependencies since the UD v2.2 release.

The following people have contributed to making this treebank part of UD: Sophie Prévost, Aurélie Collomb, Kim Gerdes, Isabelle Tellier, Marine Courtin, Alexei Lavrentiev, Céline Guillot-Barbance, Loïc Grobol, Mathilde Regnault.

Repository: UD_Old_French-PROFITEROLE
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.13

License: CC BY-NC-SA 3.0

Genre: nonfiction, legal, poetry

Questions, comments? General annotation questions (either Old French-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [sophie • prevost (æt) ens • psl • eu]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation	Source
Lemmas	not available
UPOS	annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
XPOS	annotated manually
Features	assigned by a program, not checked manually
Relations	assigned by a program, with some manual corrections, but not a full manual verification

Description

UD_Old_French-PROFITEROLE is an expansion of the previous UD_Old_French-SRCMF (which was a conversion of (part of) the SRCMF corpus (Syntactic Reference Corpus of Medieval French srcmf.org).

UD_Old_French-PROFITEROLE includes the texts of the previous UD_Old_French-SRCMF, plus Old French texts that were annotated in the frame of the PROFITEROLE funded project (Projet ANR-16-CE38-0010, 2017-2022, supervised by Sophie Prévost) Texts were automatically annotated with part-of-speech and dependencies, which are currently running a process of correction. Texs will be released in UD as they are corrected. Middle French texts that were annotated in the PROFITEROLE project are to be found in UD_Middle_French-PROFITEROLE.

UD_Old_French-PROFITEROLE consists in 12 texts spanning from 9th to 13th century. It includes 19765 sentences and 227137 tokens.

Sentences are annotated with the following metadata:

sent_id : a unique id for each sentence in the treebank
text : the sentence
newdoc id : a unique id for each of the texts. This id can be split on underscores to get back :
name of the text
date
form : verse and/or prose

The following table lists the texts used in this treebank:

ID	Name of the text	Author	Tokens	Trees
Strasbourg_842_prose	Serments de Strasbourg	anonymous	131	3
StEulalie_900_verse	Séquence de Sainte Eulalie	anonymous	212	21
StLegier_1000_verse	Vie de saint Léger	anonymous	1665	189
StAlexis_1050_verse	Vie de saint Alexis	anonymous	5662	572
Roland_1100_verse	Chanson de Roland	anonymous	34803	3890
Lapidaire_mid12_prose	Lapidaire en prose	anonymous	5494	524
QuatreLivresReis_late12_prose	Quatre livres des reis	anonymous	15030	1509
BeroulTristan_late12_verse	Tristan de Beroul	Beroul	32596	3310
TroyesYvain_1180_verse	Yvain de Chrestien	Chrestien de Troyes	47964	3880
Aucassin_early13_verse_prose	Aucassin et Nicolet	anonymous	11639	1038
Graal_1225_prose	Queste del Saint Graal	anonymous	44715	3114
ClariConstantinople_1300_prose	Conqueste de COnstantinople	Robert de Clari	27226	1715

Total

227137

19765

Acknowledgments

UD_Old_French-PROFITEROLE results from the UD_Old_French-SRCMF as well as from the automatic annotation (PROFITEROLE project, 2017-2022) of other Old French texts (with the SRCMF corpus being used as a training corpus), which were/are then manually corrected along with the UD guidelines. The contributors to the syntactic part of the PROFITEROLE project were: Prévost, Sophie; Villemonte de la Clergerie, Eric; Regnault, Mathilde; Grobol, Loïc; Crabbé, Benoît; Dehouck, Mathieu; Lavrentiev, Alexei.

UD_Old_French-SRCMF resulted from the conversion of (part of) the SRCMF corpus (Syntactic Reference Corpus of Medieval French srcmf.org). The SRCMF corpus resulted from the SRCMF project which took place in 2008-2012, funded by the ANR (France) and the DFG (Germany), and supervised by Sophie Prévost and Achim Stein.

The SRCMF project consisted in the manual syntactic annotation of 15 texts (251,000 tokens) from the 9th to 13th C. Part-of-speech tags were for most of them retrieved from the already existing tagging of the texts (stemming from: Base de Français Medieval, Lyon, ENS de Lyon, IHRIM Laboratory http://txm.bfm-corpus.org, and the Nouveau Corpus d’Amsterdam http://www.uni-stuttgart.de/lingrom/stein/corpus#nca)

The contributors to the SRCMF project were: Stein, Achim; Prévost, Sophie; Rainsford, Tom; Mazziotta, Nicolas; Bischoff Béatrice; Glikman, Julie; Lavrentiev, Alexei; Heiden, Serge; Guillot-Barbance, Céline; Marchello-Nizia, Christiane.

The whole SRCMF corpus (251,000 tokens) was converted into UD dependencies, but only 172,000 tokens had undergone a significant checking.

The conversion from the original SRCMF annotation to the SRCMF-UD annotation was done automatically both for the POS and the syntactic relations, thanks to a set of elaborated rules. Some 1,200 syntactic relations left unlabelled were then manually annotated (Sophie Prévost), and significant spot-checking occurred, focusing on potential difficulties (e.g. conj relation).

This conversion was achieved by Aurélie Collomb, during an internship funded by lab Lattice (Paris, CNRS, ENS & Université Sorbonne Nouvelle Paris 3, PSL & USPC), and supervised by Sophie Prévost, Isabelle Tellier and Kim Gerdes. Marine Courtin achieved the deposit of the files, and especially took in charge the validation of the corpus through the successive steps of the process.

A significant review of this initial release has been done on the occasion of the UD 2.6 release by Loïc Grobol and Sophie Prévost in the frame of the ANR PROFITEROLE project in order to improve the compliance of the corpus to UD guidelines. This includes both automatic correction and extensive manual corrections.

A significant import of data from the Base de français medieval has been done by Loïc Grobol, Alexei Lavrentiev and Sophie Prévost on the occasion of the UD 2.9 release. Most notably, this release adds punctuation tokens for most trees as well as around 350 new trees, consisting mostly of averbal sentences and fixes a number of conformity bugs with the UD guidelines. See the full changes in the upstream repository

References

Prévost, Sophie, Mathieu Dehouck, Alexei Lavrentiev, Serge Heiden et Loïc Grobol. To appear. [‘Profiterole : un corpus morpho-syntaxique et syntaxique de français médiéval’], Corpus
Stein, Achim, and Sophie Prévost. 2013. ‘Syntactic Annotation of Medieval Texts: The Syntactic Reference Corpus of Medieval French (SRCMF)’. In New Methods in Historical Corpora, edited by Paul Bennett, Martin Durrell, Silke Scheible, and Richard J. Whitt, 275–82. Corpus Linguistics and International Perspectives on Language. Gunter Narr Verlag.

Statistics of UD Old French PROFITEROLE

POS Tags

ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PRON – PROPN – PUNCT – SCONJ – VERB – X

Features

Definite – Foreign – Morph – NumType – Polarity – Poss – PronType – Tense – VerbForm

Relations

acl – acl:relcl – advcl – advmod – advmod:obl – amod – appos – aux – aux:pass – case – case:det – cc – cc:nc – ccomp – compound – conj – cop – csubj – dep – det – discourse – dislocated – expl – fixed – flat – iobj – mark – mark:advmod – nmod – nsubj – nsubj:advmod – nsubj:obj – nsubj:outer – nummod – obj – obj:advmod – obj:advneg – obj:obl – obl – obl:advmod – orphan – parataxis – punct – root – vocative – xcomp

Tokenization and Word Segmentation

This corpus contains 19765 sentences and 227137 tokens.

This corpus contains 32167 tokens (14%) that are not followed by a space.

This corpus does not contain words with spaces.

This corpus contains 155 types of words that contain both letters and punctuation. Examples: l', qu', s', n', d', m', .i., t', c', j', jusqu', .ii., l'en, entr', .iiii., .iii., g', q', .xx., .xii., .c., .vii., ensembl', un', ·l, quanqu', ch', .v., .xxx., c., tresqu', .x., .c.m., entresqu', k', .xv., .l., .vi., .xxiiii., .ix., josqu', .viii., an.ii., cest', ·s, .XL., .iiij.m., .lx., .xxxvi.m., jesqu'

Morphology

Nominal Features

Definite

Def
- ADP: au, des, del, as, el, al, du, dou, ou, es
- AUX: es
- DET: li, la, le, l', les, lo, lu, lé, lí, lis

Ind
- DET: un, une, .i., uns, unes, un', úne, u·, ún, U

Degree and Polarity

Polarity

Neg
- ADV: ne, n', mie, pas, non, point, nen, nun, nes, nient
- PRON: nel, nes, nu, nen, nul

Verbal Features

Tense

Past
- ADJ: hardi, hardiz, barbee, quarré
- ADJ-Part: barbee, hardiz, quarré
- AUX: esté, este, fait, éste, estet
- AUX-Part: esté, fait, estet
- NOUN-Part: morz, Seignurs, adubez, asolue, comandet, guariz, loee, parjurez, preisez
- VERB: fait, dit, mis, mort, venuz, fet, pris, morz, prise, oï
- VERB-Part: fait, dit, mort, mis, fet, venuz, pris, morz, oï, prise

Pres
- ADJ-Part: dolanz, dolantes
- VERB: querant, curant, plorant, recreant, parlant, recreanz, trenchant, veant, curanz, dolans
- VERB-Part: querant, curant, plorant, parlant, recreant, recreanz, trenchant, veant, curanz, dolans

Pronouns, Determiners, Quantifiers

PronType

Art
- ADP: au, des, del, as, el, du, dou, al, ou, es
- AUX: es
- DET: li, la, le, l', les, un, une, .i., uns, unes

Dem
- ADP: Ches, an
- ADV: en, i, an, í, em, u, o, ent, enn, ·n
- AUX: en
- DET: ceste, cest, cele, cel, ces, cil, cez, cist, ches, chu
- PRON: ce, cil, ço, çó, chou, celui, cele, cels, chil, che

Ind
- ADJ: autre, meïsmes, autres, tel, altre, nule, meïsme, tex, altres, meesme
- ADV: tout, tot, tut, tant, po, alques, tous, Tel, Tute, el
- DET: tel, toz, nule, tote, nul, tuit, autre, tot, tout, toutes
- PRON: on, autre, tuit, nus, rien, uns, un, autres, l'en, en
- SCONJ: quant, que

Int
- ADV: cum, comant, purquei, con, coment, que, Cument, porqoi, ou, conment
- DET: quel, qel, quels, quele, Qanz, itels, quex
- PRON: que, qui, coi, ou, qu', quoi, quei, ki, liquels, q'

Prs
- ADV: nen, s', nel
- DET: les, l', le, li, me, la, lor
- PRON: il, vos, li, s', le, l', je, se, ele, lui
- SCONJ: s', se

Prs,Rel
- ADP: U, ou
- ADV: Don
- CCONJ: que, c', Ou, U, qu'
- DET: laquele
- PRON: qui, que, ki, qu', ou, cui, quoi, dunt, dom, don
- SCONJ: que, qu', q', c', chi, k'

Rel
- ADV: Dun, que, u
- CCONJ: ou
- DET: quel, quele, quelque, quiex, qel, quels, qual, quex, laquele, ques
- PRON: qui, ou, que, qu', dont, donc, ki, coi, dom, cui
- SCONJ: que, qu', queque, quanque, Quequ', ke

NumType

Card
- ADJ: premereins, dui, .iii., .vii., ambesdous, anbedui, mille, premer, premerein, troi
- DET: mile, .I., deus, .XXIIII., .iij., .l., .vij.c., Un, ambdui, chens
- NUM: deus, .ii., trois, dous, cent, dui, quatre, .iiii., milie, .iii.
- PRON: milie, trois, deus, dui, andui, .ii., un, troi, quatre, uns

Ord
- ADJ: premiere, tierche, premeraine, premeraines, premiers, primiers, quarte, tier, tierz
- DET: tierz, premiere, tierce
- PRON: tierz, quarte, terce, disme, quarz, sedme, noefme, premere, quinte, siste

Poss

Yes
- ADJ: mien, vostre, suen, sue, men, nostre, meie, moie, soe, miens
- DET: sa, son, ses, sun, vostre, lor, ma, nostre, mon, mes
- PRON: suen, mien, suens, noz, sien, vostre, lur, soe, leur, lor

Other Features

Foreign
- Yes
  - ADP: in, en
  - ADV: illo
  - NOUN: corpus, domini, damno, verbe
  - X: Explycit

Morph
- VFin
  - ADJ: asuage
  - ADP: a, ad
  - ADV: oi
  - CCONJ: Et
  - INTJ: Os
  - NOUN: acorde, aiüe, alge, chastie, curt, dreit, duinst, esrages, estencele, façon
  - PROPN: cuntredie
- VInf
  - ADJ: droiturier, ácustumiers
- VPar
  - ADP: voiant, oiant
  - ADV: errant
  - PROPN: Flurit, Perdut, Sevree

Syntax

Auxiliary Verbs and Copula

This corpus uses 2 lemmas as copulas (cop). Examples: _, estre.

This corpus uses 8 lemmas as auxiliaries (aux). Examples: _, avoir, pöoir, estre, voloir, devoir, savoir, souloir.
This corpus uses 2 lemmas as passive auxiliaries (aux:pass). Examples: _, estre.

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

nsubj
- VERB--NOUN (76)
- VERB--PRON (98)
- VERB-Fin--NOUN (3257)
- VERB-Fin--NOUN-ADP(_) (3)
- VERB-Fin--PRON (8138)
- VERB-Fin--PRON-ADP(_) (3)
- VERB-Inf--NOUN (161)
- VERB-Inf--PRON (846)
- VERB-Part--NOUN (927)
- VERB-Part--PRON (1739)

obj
- VERB--NOUN (75)
- VERB--PRON (61)
- VERB-Fin--NOUN (5204)
- VERB-Fin--NOUN-ADP(_) (71)
- VERB-Fin--NOUN-ADP(dalez) (1)
- VERB-Fin--NOUN-ADP(de) (20)
- VERB-Fin--NOUN-ADP(en) (2)
- VERB-Fin--NOUN-ADP(par) (1)
- VERB-Fin--PRON (5010)
- VERB-Fin--PRON-ADP(_) (5)
- VERB-Fin--PRON-ADP(de) (2)
- VERB-Fin--PRON-ADP(por) (1)
- VERB-Inf--NOUN (1014)
- VERB-Inf--NOUN-ADP(_) (9)
- VERB-Inf--NOUN-ADP(de) (3)
- VERB-Inf--PRON (939)
- VERB-Inf--PRON-ADP(_) (2)
- VERB-Inf--PRON-ADP(por) (1)
- VERB-Part--NOUN (761)
- VERB-Part--NOUN-ADP(_) (8)
- VERB-Part--NOUN-ADP(de) (2)
- VERB-Part--PRON (949)
- VERB-Part--PRON-ADP(_) (1)

iobj
- VERB--PRON (19)
- VERB--PRON-ADP(_) (7)
- VERB--PRON-ADP(_)-ADP(_) (2)
- VERB-Fin--NOUN (1)
- VERB-Fin--PRON (2446)
- VERB-Fin--PRON-ADP(_) (257)
- VERB-Fin--PRON-ADP(_)-ADP(_) (3)
- VERB-Fin--PRON-ADP(a) (19)
- VERB-Fin--PRON-ADP(après) (3)
- VERB-Fin--PRON-ADP(avuec) (4)
- VERB-Fin--PRON-ADP(contre) (3)
- VERB-Fin--PRON-ADP(dalez) (1)
- VERB-Fin--PRON-ADP(de) (12)
- VERB-Fin--PRON-ADP(de)-ADP(avuec) (1)
- VERB-Fin--PRON-ADP(devant) (7)
- VERB-Fin--PRON-ADP(devers) (2)
- VERB-Fin--PRON-ADP(en) (2)
- VERB-Fin--PRON-ADP(encontre) (3)
- VERB-Fin--PRON-ADP(entre) (3)
- VERB-Fin--PRON-ADP(environ) (1)
- VERB-Fin--PRON-ADP(lez) (2)
- VERB-Fin--PRON-ADP(par)-ADP(devant) (1)
- VERB-Fin--PRON-ADP(por) (1)
- VERB-Fin--PRON-ADP(sor) (2)
- VERB-Fin--PRON-ADP(vers) (7)
- VERB-Inf--PRON (231)
- VERB-Inf--PRON-ADP(_) (42)
- VERB-Inf--PRON-ADP(a) (6)
- VERB-Inf--PRON-ADP(dalez) (1)
- VERB-Inf--PRON-ADP(de) (4)
- VERB-Inf--PRON-ADP(devant) (4)
- VERB-Inf--PRON-ADP(en) (1)
- VERB-Inf--PRON-ADP(vers) (2)
- VERB-Part--PRON (427)
- VERB-Part--PRON-ADP(_) (43)
- VERB-Part--PRON-ADP(a) (4)
- VERB-Part--PRON-ADP(de) (2)
- VERB-Part--PRON-ADP(devant) (1)
- VERB-Part--PRON-ADP(en) (1)
- VERB-Part--PRON-ADP(entre) (3)
- VERB-Part--PRON-ADP(environ) (1)
- VERB-Part--PRON-ADP(par) (5)
- VERB-Part--PRON-ADP(par)-ADP(dalez) (1)
- VERB-Part--PRON-ADP(sor) (2)

Relations Overview

This corpus uses 13 relation subtypes: acl:relcl, advmod:obl, aux:pass, case:det, cc:nc, mark:advmod, nsubj:advmod, nsubj:obj, nsubj:outer, obj:advmod, obj:advneg, obj:obl, obl:advmod
The following 4 relation types are not used in this corpus at all: clf, list, goeswith, reparandum