GRToBI
(GREEK
TONES
AND
BREAK INDICES)
|
Please
note:
(a) this page is perpetually under construction, so we would appreciate
it if
you would notify us of
any mistakes,
inconsistencies or problems;
(b) the most complete description of GRToBI currently available is Arvaniti & Baltazani, 2005
(c) this page provides downloadable files, and includes some updates to
the
system which it was not possible to include in Arvaniti & Baltazani
(2005),
but
it
is
by necessity brief and in no way could be seen as a
substitute of the published papers on GRToBI.
GRToBI
is
a
tool for the annotation of Greek spoken corpora; it
provides a system for annotating intonational, prosodic and (limited)
phonetic
information,
though
users
can
add tiers that encode other types of
information as well. Although it was originally designed to work on
Waves+, the
audio and annotation
files
have
been
converted so that they can now be read in PRAAT. You can download them from this site.
GRToBI is not a transcription system for Greek
intonation, i.e.
it is not equivalent to a list of IPA symbols for Greek
intonation. This is so for two related reasons.
First,
GRToBI
assumes
a particular view of prosody and of the
relationship between phonetics and phonology. In particular, GRToBI is
based on
the
autosegmental-metrical
framework
of
intonational phonology which
assumes a principled distinction between phonetics and phonology. The
analysis
on which GRToBI
is
based
is
phonological; that is, GRToBI is not meant to be a
surface phonetic transcription of Greek intonation in other words.
Thus, the
annotation labels may not always
be
phonetically
transparent,
and are not intended to capture all
possible variations in phonetic realization. On the other hand, the
differences
indicated by the autosegmental
representations
are
meant
to be meaningful, that is to reflect
pragmatic differences related to the melody. Second, GRToBI has
been
designed to represent
the
prosodic
system
of Greek as spoken in
of
Greek
and
may not include entities that are attested in those
varieties; further, the phonetic realization of the same entity may
differ
between the Athenian
variety
assumed
here
and other varieties; finally, it is quite
possible that certain configurations (e.g. the combinations of
phrasal
tones described below) do not
occur
in
all
varieties of Greek or if they do occur they have
different pragmatics.
In
terms
of
design, GRToBI is
similar to the original ToBI
system for American English or MAE ToBI (MAE stands for Mainstream
American
English; see Silverman
et al., 1992; Beckman et al. 2005). GRToBI
has
been
adapted
from this original design so that prosodic phenomena
requiring
special attention in Greek, such as sandhi,
can
be
more
easily annotated.
The
prosodic
and
intonational
analyses assumed in GRToBI have been based
(a) on existing research on various aspects of Greek
prosody (see Bibliography); since new
findings are continuously
added to this body of research, it is natural
that
particular
aspects
of the
phonological analysis assumed in GRToBI may be revised in the light of
new
evidence.
(b) on the transcription of a corpus of spoken Standard Greek
that
includes data from several speakers using a variety of styles (read
text, news
broadcasting,
interviews,
spontaneous
speech).
GRToBI
is
being
used to develop a publicly available corpus of
annotated utterances. Prosodically annotated corpora are an important
language
resource for
several
reasons.
Corpora
based research can contribute to the
better understanding of prosody (see e.g. Arvaniti
& Pelekanou 2002), a neglected aspect of
spoken
language,
the
importance of which in speech production,
speech perception and language acquisition has now begun to emerge. A
better
understanding
of
prosody
will
also lead to more natural speech synthesis and,
possibly, more efficient speech recognition systems.
Further,
GRToBI
is
based on an
analysis of Greek prosodic structure. This analysis, developed on the
basis of
existing research and the annotation of the GRToBI
corpus
itself,
is
the first
systematic description of Greek prosody and, as such, useful for
theoretical
reasons.
Thus,
the
GRToBI
database
and prosodic analysis system can be used for
several
purposes:
by
searching
the
corpus;
GRToBI
annotated
utterances
as
examples;
A
GRToBI
annotated
file consists of an audio recording of
the utterance (in wav format) and an annotation file (currently in the
PRAAT
textgrid format).
When
the
wav
and textgrid files are opened together the
following appear: the waveform of the utterance, the spectrogram and
pitch
contour of the utterance
(in
Hz),
as
well as the five GRToBI annotation tiers (see Figure 1
for an
example):
For
the
intonational
analysis of Greek utterances we recognize
three types of tonal events, pitch
accents,
phrase
accents,
and
boundary
tones,
and
two
levels
of
phrasing,
the
intermediate phrase (ip) and the intonational
phrase (IP).
A
pitch
accent
is a melody that is phonologically associated
with a metrically strong syllable; phonetically, a pitch accent
co-occurs (more
or less) with the stressed
syllable
it
is
phonologically associated with. Pitch
accents are “prominence cueing” (a term coined by Francis Nolan) in that
they
indicate that the syllable with
which
they
co-occur
is meant to be construed as being metrically
prominent (i.e. stressed). In Greek there is typically one prominent
syllable
in each content word;
its
position
is
lexically specified. Most function words do not
have prominent syllables (though they may carry orthographic accent),
but some
do, such as the word
[katá]
when
it
means ‘against’. In some cases, Greek
words may carry two pitch accents: this happens when a word is stressed
on the
antepenult and is followed
by
an
enclitic
(e.g. [to aftokínitó mu] ‘the car
my’), or when a word is stressed on the penult and is followed by two
enclitics (e.g. [fére mú to] ‘bring me it’). In such
cases,
both
stressed
syllables can be accented; if there is only
one accent, this falls on the rightmost stressed syllable of the entire
group.
Thus, in Greek we can
distinguish
between
unstressed
syllables, such as [re] in
[fére mú to], stressed but
unaccented syllables, such as [fe] in [fere
mú to], and stressed and accented
syllables,
such
as
[fé] and [mú] in [fére mú
to].
Our research suggests that in Greek we can distinguish five pitch
accents: H*,
L*, L*+H, L+H*, H* and H*+L. The typical distribution and phonetic
realization
of
these
accents
is
described below and shown in Figures 1,
2,
3,
and
4.
Each
utterance
has at least
one pitch accent, but typically more, since in Greek most stressed
syllables
are
also
accented; thus, while an English phrase such
as Mary loves John is most likely uttered without an accent on
loves,
in a similar Greek phrase, such
as
[i
ma'ria
mi'lai sto 'mano] ‘Maria is talking to
Manos’, all three content words are accented. The last accent of an
utterance is called the nucleus. By default,
the
nucleus
falls
on the last content word of an utterance (if
it is a declarative—for other types of utterance, see below), but it
may
occur on an earlier word under
narrow
or
contrastive
focus.
whereas
H*
signals
broad
focus (Baltazani 2003). The H* accent lacks the initial
dip
associated with the L tone of the L+H* (Arvaniti
et
al.
2006b) and its peak is probably aligned
earlier
in
the
accented
vowel, though quantitative data on this point are not yet
available.
Baltazani
&
Jun
1999;
Baltazani
2007b), and in the “suspicious” calling contour (that is, when
the vocative is to be interpreted as “is that you?”)
L
tone
is
aligned
at or slightly before the onset of the accented syllable, and
the H
tone is aligned at the beginning of the first post-accentual vowel
(Arvaniti
& Ladd, 1995;
Arvaniti et al. 1998). The realization
of L*+H
is different in contexts showing tonal crowding
(Arvaniti et al. 2000).
lies
in
the
alignment
of the H tone: the H tone of L+H* is well within the
accented
vowel, whereas the H tone of the L*+H aligns early in the first
post-accentual
vowel.
throughout
this
syllable
(see
the last accent in Figure 9). In
terms of meaning, the use of H*+L conveys a sense of “stating the
obvious” that is, the implication that the addressee
should
have
known
or
expected the answer.
At
this
stage,
it
is not clear whether there is any particular meaning
associated with
downstep, or with particular downstepped accents.
In
GRToBI,
three
phrase accents, H-, L- and !H-, are assumed to
exist in Greek. As mentioned earlier, our analysis suggests that Greek
has two
levels of phrasing,
the
intermediate
and
intonational phrase (ip and IP
respectively), and it is assumed here that phrase accents demarcate the
right
boundary of intermediate phrases.
This
analysis
is
based on the following observations. Tones
associated with ips typically show simple F0 movements, unlike those
associated
with IPs which can show complex
pitch
configurations.
Further
there is a difference
in scaling between ips and IPs, in cases where both have similar
pitch
movements, with ip boundaries exhibiting less extreme
F0
values
than
IP boundaries (i.e. a H- is scaled lower than a
H-H% configuration). In addition, the pauses after IPs (even non-final
ones)
are longer and more frequent
than
those
for
ips. Recent research also suggests that left ip
and IP boundaries are associated with “prosodic strengthening”
manifested as lengthening of ip and IP initial
consonants
(Kastrinaki
2003). Non-final intermediate phrases typically have a H- or L-
phrase
accent at their right edge. !H-, on the other hand, is used only
in
certain types
of
stylized
intonation
and then only in utterance-final ips
(i.e. it is always followed by a boundary tone).
More recently, Arvaniti et al. (2006a)
have
shown that the melody of Greek polar questions is difficult to
accommodate with
this inventory of phrase accents, and suggest
that
the
phrase
accent of these questions is a bitonal L+H-. In addition,
this phrase accent does not always co-occur with the right edge of the
intermediate phrase it is
associated
with;
rather,
when the nucleus of the polar question
is on a non-final word, the L+H- phrase accent aligns with the last
stressed
syllable of the question. The
two
patterns
of
alignment of the L+H- used in Greek polar
questions are illustrated in Figure 4.
GRToBI
includes
three
types of boundary tone, H%, L%, and !H%.
These boundary tones demarcate the right edges of intonational phrases.
They
combine with most phrase
accents
into
configurations
which are frequently interpreted in
the ways shown below (though the list is only indicative; the
interpretation of
a contour depends also on
the
utterance
and
the context in which it is used).
|
L-L% |
declaratives,
negative
declaratives,
imperatives, wh-questions |
|
L-H% |
“involved”
continuation
rise,
“suspicious” calls |
|
H-L% |
yes-no
questions,
requesting
calling contour (note: according to Arvaniti
et al. 2006a this combination
is L+H-L%) |
|
H-H% |
continuation
rise,
questioning
calling contour |
|
L-!H% |
“involved”
wh-questions,
negative
declaratives (showing reservation),
requesting imperatives |
|
H-!H% |
stylized
continuation
rise |
|
!H-!H% |
stylized
call,
incredulous
questions |
|
!H-H% |
polite
stylized
call |
The
Prosodic
Words
(PrWords) Tier is a phonetic
transcription using ASCII characters (see Appendix Ifor
coventions).
This
tier
facilitates the analysis of sandhi
(connected
speech
phenomena,
such as segment assimilations and
deletions across word boundaries), and fast speech rules, by encoding
their
outcome. Like all
transcriptions,
this
tier
has its limitations, and is not meant
to be a substitute for acoustic analysis; rather, it allows annotators
to flag
instances of sandhi for
more
detailed
acoustic
analysis. The PrWords Tier provides
information about stress, since this information cannot be deduced from
the
transliteration in the
Words
Tier
or
derived from a dictionary (e.g., as mentioned,
content Greek monosyllables, as well as some function words, are
normally
stressed and pitch accented
in
speech,
but
not in orthography; in contrast, disyllabic
function words are orthographically accented, but most do not normally
carry
stress in speech). In this tier,
each
prosodic
word
(defined as a sequence of items showing total
cohesion) is transcribed as one label.
This
tier
is
a transliterated version of the text, equivalent to
the Orthographic Tier in the American English ToBI (see Appendix II
for
conventions).
There
are
four
break indices, 0, 1, 2, and 3. Break indices mark
cohesion (or the lack thereof) between constituents in an utterance.
that
may
bear
only
one pitch accent (with the noted exception of hosts and clitics
with two
accents). Several types of sandhi may occur across a BI 0 boundary,
however,
sandhi
is
not
necessary for a BI 0 to be used. For example, a proclitic
particle like /na/ and the following verb are perceived as one PrWord
by native
speakers,
but
no
sandhi
can occur between /na/ and a consonant-initial verb.
all
PrWords
following
an
early focus are de-accented;
Baltazani
& Jun, 1999; Botinis, 1998). In general, if an item is accented,
then it
should be considered
a
separate
PrWord. On
the
other
hand,
the absence of accent, as mentioned, does not constitute evidence
that an
item is not an independent PrWord.
Four
diacritics
are
used to provide more details on the prosodic
structure of utterances.
Greek
(see
Nespor
&
Vogel, 1986; Kaisse, 1985; for a different point of view,
see
Arvaniti, 1991, and results in Baltazani
2006b;
for a review of the relevant
literature,
see
Arvaniti in press). Our corpus
confirmed
previous studies that used naturally occurring data (e.g. Fallon, 1994)
in
showing that sandhi can apply
across
larger
constituents
than
postulated by, e.g., Nespor & Vogel (1986); see Figures
7 and 9 for
sandhi across
PrWords, Figure 6 for sandhi across an ip
boundary.
Further
annotation
of
the GRToBI corpus has also shown that some of the sandhi
rules of Greek are better described as gestural overlap
(Pelekanou, 2000; Arvaniti &
Pelekanou 2002; Baltazani 2006b). Since
the
presence
of sandhi does not necessarily signal
cohesion, we have decided to use the
diacritic
s
for sandhi at all prosodic levels, and thus provide an easy way of
searching
the database for such instances. We believe that the sandhi phenomena
will
be
better
understood
if a large corpus of natural spoken data is
investigated.
in
which
the
context
for sandhi exists but nevertheless sandhi does not take place.
For
example if a sequence like /ton 'pono/ ‘the pain. Acc.’ is
pronounced
[ton
'pono]
it
should be marked as 0m, since in general it should be
pronounced
[to'mbono] or [to'bono]. The m diacritic should be
used
with
BIs
1,
2, and 3 to mark the presence of a boundary without the tonal
events
that normally accompany it.
This
tier
should
be used for annotating non-structural
information that may be useful in interpreting the file, such as
coughing,
disfluency, pitch halving or rate of speech.
|
Figure
1:
This
example
(‘Do
the flowers really smell?’ lit. ‘the flowers smell?’),
uttered in a surprised manner, illustrates the different alignment of
the L*+H (the accent on ['lludia]) and L+H* (the accent on
[mi'rizune]). Note the difference in the alignment of the H tone in the
two accents. |
|
Figure
2:
This
example
(‘S/he is talking to Charalampos’) shows a downstepped !H*
nuclear accent. Note the lack of a dip at the beginning of this accent
and compare it to the L+H* in Figure 1. |
|
|
3a
(left) |
|
Figure
4:
This illustration shows two typical L* accents on segmentally identical
questions, but with focus on the word [mi'rizune] ‘they smell’ on the
left and on the word [lu'ludia] ‘flowers’ on the right. The questions
mean ‘Do the flowers SMELL?’ and ‘Is it the FLOWERS that smell?’
respectively. Note also the different alignment of the H-, which is on
(unstressed) [zu] in the question on the left, but the stressed
syllable [ri] in the question on the right. |
|
|
Figure 5: This example
(gloss: ‘Dalida was scolding the baby [when she fainted]!’) exemplifies
the L-H% phrasal configuration, which is preceded in this case by a
L+!H* accent on [mo'ro]. Note also the undershot (wL*+H) and early
aligned (>L*+H) realizations of the two L*+H accents on ['malone]
and [iDali'Da] respectively. |
|
Figure 6: This example,
(‘The north wind and the sun agreed…’), illustrates the difference in
scaling between H- and H-H%. Further the example shows the diphthongal
realization of /o 'iLos/, which together with the conjuction /ke/
‘and’ form one PrWord with a rising diphthong [oI], ['coILos]; note the
alignment of the L* which aligns with the whole syllable [coI].
Finally, the canonical alignment of the L*s of ['coILos] and
[si'mfonisan], which are manifested as low plateaus, can be contrasted
with the wL* of [mo'ro] in Figure 9. |
|
Figure 7: This example
([ta.meseona] ‘We do not live in the Middle Ages’) illustrates the
typical pattern of a negative declarative expressing reservation. Note
that the negative particle /Den/, which is considered a phonological
clitic, carries the nuclear (and only) pitch accent of the utterance,
and thus forms a separate pword from the de-accented verb ['zume] ‘we
live’; yet, sandhi (/n/-deletion before the fricative [z]) does take
place as well. The rest of the utterance is deaccented, with the L-
spreading until after the last stressed syllable ([se] of [me'seona]
‘Middle Ages’). Finally, compare the scaling of the !H% (relative to
that of the L+H* peak) to the scaling of the H- and H% tones in Figures
5, 6 and 9 relatively to the accentual Hs in those
examples. |
|
|
Figure 8: This example (gloss: ‘our focus is…’) illustrates the stylized H-!H% configuration on the word ['ine] ‘is’. Note also the presence of two accents on the word [epi'cedrosi] ‘focus’, which here is followed by the enclitic [mas] ‘ours’, and thus carries enclitic stress on its last syllable [si]. |
|
|
|
|
Figure 9: This example,
(gloss: ‘Dalida was scolding the baby when the phone rang’), shows two
different realizations of L*+H under tonal crowding, >L*+H which is
realized earlier than it canonically would (the H tone is aligned with
the accented vowel, instead of the first postaccentual vowel), and
wL*+H, in which the L* tone is undershot, while the H shows the typical
late alignment of H in L*+H accents. In this utterance there is also an
undershot L* (wL*) on [mo'ro], realized as a rise from low pitch
throughout the accented syllable (cf. the canonical L*s in Figure 6). |
|
|
Figure 10: This example,
(gloss: ‘[You] BECOME-PART of society through dance’) illustrates
de-accenting after early focus. Note also, the several instances of
sandhi and fast speech rules. |
Acknowledgments
The development of GRToBI largely took place while the first author was
a
visitor at the Ohio
State
University
Linguistics
Laboratory. We would like to thank
the
members
of
the Laboratory,
particularly Mary Beckman, Julie McGory, Shu-hui Peng, Amanda Miller
and
Mariapaola D’Imperio for their encouragement and
input
during
the
development of
GRToBI, and also for long distance technical support. Thanks are also
due to
Georgios Tserdanelis for providing wav files for this
site,
and
to
the students in Prof.
Beckman’s and Dr McGory’s ToBI course for useful feedback at a
first presentation of GRToBI. We are grateful to Sun-Ah Jun
who
brought
us
together, suggested
we develop the system and played devil's advocate at the early stages
of its
gestation. Finally, we would like to express our
thanks
to
Jenny
and Peter Ladefoged
for their kind hospitality to the first author during her stay at
II: Gesture, Segment,
Prosody,
398-423.
Cambridge
University
Press.
University of Salzburg.
Typology:
The
Phonology
of
Intonation and Phrasing,
pp.
84-117.
Oxford: Oxford
University Press.
Congress of Phonetic
Sciences,
4:
220-23.
Stockholm.
Laboratory
Phonology V.
·
Arvaniti,
A.,
D.
R. Ladd & I. Mennen
(2006a). Phonetic effects of focus and “tonal
crowding” in intonation: Evidence from Greek polar questions.
questions. Speech Communication 48: 667-696.
·
Arvaniti, A., D. R. Ladd & I.
Mennen (2006b) Tonal association and
tonal
alignment: evidence from Greek polar questions and emphatic statements.
statements. Language and Speech 49:
421-450.
Actes
du 5e
Colloque international de linguistique grecque, pp. 71-74. Paris : L’Harmattan.
·
Baltazani,
M. (2002) Quantifier
scope
and
the
role of intonation in Greek. Ph.D.
dissertation, UCLA.
·
Baltazani,
M. (2003) Broad Focus across sentence types in Greek. Proceedings
of
Eurospeech-2003, Geneva, Switzerland.
·
Baltazani,
M. (2004) The prosodic structure of quantificational sentences in
Greek. Proceedings
of the 38th meeting of the Chicago Linguistic Society.
·
Baltazani,
M. (2006a) On
/s/-voicing in
Greek. Proceedings of the 7th
International
Conference on Greek Linguistics, York, UK.
·
Baltazani,
M. (2006b) Focusing,
prosodic phrasing and Hiatus resolution in Greek. In Laboratory
Phonology
8, Luis Goldstein, Douglas Whalen, Catherine Best (eds.),
Mouton de
Gruyter, Berlin,
473-494.
·
Baltazani,
M. (2006c) Intonation
and
pragmatic
interpretation
of negation in Greek. Journal of
Pragmatics,
38, Issue 10, p. 1658-1676, Elsevier.
·
Baltazani,
M. (2007a) Prosodic
rhythm
and
the
status of vowel reduction in Greek. In Selected
Papers
on Theoretical and Applied
Linguistics from the 17th International
Symposium
on
Theoretical
& Applied Linguistics, Volume 1,
Department of Theoretical and Applied Linguistics, Salonica, p.
31-43.
·
Baltazani,
M. (2007b) Intonation
of
polar
questions
and the location of nuclear stress in Greek.
In Tones and Tunes,Volume II, Experimental
Studies in Word and Sentence Prosody,
Carlos
Gussenhoven
&
Tomas
Riad (eds.), Mouton de Gruyter, Berlin, p. 387-405.
·
Baltazani,
M.
&
Jun S. (1999) Focus and topic intonation in
Greek. Proceedings of the XIVth International Congress of Phonetic
Sciences,
2: 1305-8. San Fransisco.
Prosodic Typology: The
Phonology of Intonation
and Phrasing,
pp.
9-54.
Oxford: Oxford University Press.
Themes in Greek
Linguistics,
217-224.
London:
John
Benjamins Publishing Co.
Spoken
Language
Processing,
867-879.
In
addition
the
following
symbols and conventions should be used:
·
Noticeably
centralized
vowels
should be transcribed as @.
·
Noticeably
nasalized
vowels
should be transcribed with a
following ~; e.g. a~ .
·
In
cases
of
vowel coalescence, both vowels should be transcribed
and joined by +; e.g. u+o resulting from a sequence of
/u/ and /o/ (usually
across a word
boundary).
·
Whispered
vowels
should
be transcribed in brackets.
·
Vowels
that
phonologically
form separate syllables but are
phonetically manifested as a rising diphthong (on the basis,
e.g., of tonal
alignment evidence), should
be transcribed with the second vowel capitalized; stress should be
placed
before the diphthong.
·
Stress
should
be
marked before the consonant(s) of the stressed
syllable, following IPA conventions. (At present we are
agnostic as to
syllabification, so we suggest
that transcribers mark maximal onsets, unless tonal alignment or their
own
intuitions suggests otherwise.)
separated
by
fullstops;
e.g.
a.i.d’oni "nightingale".
mark
in
the
Words
tier.
Home
At a Glance Research Publications CV Speech
Lab Linguistics
UCSD