Conversion from UD to SUD
This page describes the process used in the conversion from UD to SUD. It also explains how this can be adapted to languages specificities.
The main sequence
-
Onf (eud_to_ud): Remove all enhanced annotation; the conversion supposes that the input is in basic UD format. Note that it can be safely applied to basic UD, the annotations are left unchanged. -
Onf (idioms): Add the features encoding of idioms in SUD; namely, featuresExtPos,PhraseType,InTitleandInIdiom(see Idioms and titles). Note that relations are not changed here. -
specific_expr_init: Add an explicit node for eachExtPos. TODO: give detail and an example. -
Onf (sub_relations): Transform UD relations with subtypes into the SUD equivalent. -
Onf (rel_extensions): Transform remaining UD subtypes (not handled insub_relations) intodeepSUD feature. For instance, the Polishcop:locatis transformed intocop@locat. -
Onf (relations): Transform main UD relation into the SUD equivalent (exceptcase,aux,markandcop, see next step). -
reverse_relations.main: Reverse relationscase,aux,markandcop. See below for detail about reversing relations. -
Move the dependents of a conjunction from the left conjunct to the right conjunct. Dependencies
conj,discourse,parataxisandpunctare not moved.Onf (shared_left_conj-dep)Onf (unshared_left_conj-dep)Onf (minimize_right_conj-dep)
-
Onf (add_conj_emb): Mark embeddedconjrelations with the extensionemb. -
Onf (chained_relations): Dependencies of typeconj, andflat:*grouped into a bouquet are reorganised into a chain. -
specific_expr_close: Remove specific nodes and edges introduced by the dual packagespecific_expr_init. -
Onf (unk_rel): Rename all non-SUD relations tounk(backoff package).
Defining rules for reversing relations is tricky mainly for two reasons:
- When more than one relations to be reversed have the same head, the order of the reverse operations produced different output. Some mechanism to describe the wanted order is necessary.
- When reversing a relation from
NtoMinto a relation fromMtoN, we have to decide for each dependent ofNif it should be lifted up toMor if it should stay onN.
Choosing the order when reversing relations
To constraint the order, a numeric level is given to each edge to be reversed and then:
- edge with the smallest level have higher priority
- if two edges have the same level and are on the same side of the head, the closest one has higher priority
- if two edges have the same level and are on both sides of the head, the one after the head has higher priority.
By default, the 4 relations case, cop, aux and mark (and their subtypes) are given the level 10.
We give below examples of conversions with multiple reversing of relations.
In Japanese or in German, the default rules are applied.
The order can be changed by adding different levels to specific relations before calling the strategy reverse_relations.main (see examples below for French and Wolof).
Japanese
In Japanese all UD relations case, cop, aux and mark are left-headed. The constraint 2 applies.
German
In German, there are many cases with edges on both sides. Contraint 3 applies here:
French
In French, levels are set to:
caseorcase:*→ 10coporcop:*→ 20aux:causoraux:pass→ 30auxoraux:*(≠aux:causoraux:pass) → 40markormark:*→ 50
From the UD annotation:
The universal conversion produces:
And the conversion with the French specific levels (see GitHub):
Wolof
In Wolof, the lemma na must always be the head of the whole structure, so it must be the last relation to be reversed. This can be specified with a rule:
rule na {
pattern { e: V -[aux]-> A; A[lemma="na"] }
commands { e.level = 100 }
}
From the UD annotation:
The universal conversion produces:
And the conversion with the new na rule produces (see GitHub):
More examples of na as the head of a double aux construction: Grew-match.
Lifting dependencies
TODO