GRToBI
(GREEK TONES AND BREAK INDICES)
|
Please note:
(a) this page is perpetually under construction, so we would appreciate it if
you would notify us of any mistakes,
inconsistencies or problems;
(b) the most complete description of GRToBI currently available is Arvaniti & Baltazani, 2005
(c) this page provides downloadable files, and includes some updates to the
system which it was not possible to include in Arvaniti & Baltazani (2005),
but it is by necessity brief and in no way could be seen as a
substitute of the published papers on GRToBI.
GRToBI is a tool for the annotation of Greek spoken corpora; it
provides a system for annotating intonational, prosodic and (limited) phonetic
information,
though users can add tiers that encode other types of
information as well. Although it was originally designed to work on Waves+, the
audio and annotation
files have been converted so that they can now be read in PRAAT. You can download them from this site.
GRToBI is not a transcription system for Greek intonation, i.e.
it is not equivalent to a list of IPA symbols for Greek
intonation. This is so for two related reasons.
First, GRToBI assumes a particular view of prosody and of the
relationship between phonetics and phonology. In particular, GRToBI is based on
the
autosegmental-metrical framework of intonational phonology which
assumes a principled distinction between phonetics and phonology. The analysis
on which GRToBI
is based is phonological; that is, GRToBI is not meant to be a
surface phonetic transcription of Greek intonation in other words. Thus, the
annotation labels may not always
be phonetically transparent, and are not intended to capture all
possible variations in phonetic realization. On the other hand, the differences
indicated by the autosegmental
representations are meant to be meaningful, that is to reflect
pragmatic differences related to the melody. Second, GRToBI has been
designed to represent
the prosodic system of Greek as spoken in
of Greek and may not include entities that are attested in those
varieties; further, the phonetic realization of the same entity may differ
between the Athenian
variety assumed here and other varieties; finally, it is quite
possible that certain configurations (e.g. the combinations of phrasal
tones described below) do not
occur in all varieties of Greek or if they do occur they have
different pragmatics.
In terms of design, GRToBI is
similar to the original ToBI
system for American English or MAE ToBI (MAE stands for Mainstream American
English; see Silverman
et al., 1992; Beckman et al. 2005). GRToBI has
been adapted from this original design so that prosodic phenomena requiring
special attention in Greek, such as sandhi,
can be more easily annotated.
The prosodic and intonational
analyses assumed in GRToBI have been based
(a) on existing research on various aspects of Greek prosody (see Bibliography); since new findings are continuously
added to this body of research, it is natural
that particular aspects of the
phonological analysis assumed in GRToBI may be revised in the light of new
evidence.
(b) on the transcription of a corpus of spoken Standard Greek that
includes data from several speakers using a variety of styles (read text, news
broadcasting,
interviews, spontaneous speech).
GRToBI is being used to develop a publicly available corpus of
annotated utterances. Prosodically annotated corpora are an important language
resource for
several reasons. Corpora based research can contribute to the
better understanding of prosody (see e.g. Arvaniti
& Pelekanou 2002), a neglected aspect of
spoken language, the importance of which in speech production,
speech perception and language acquisition has now begun to emerge. A better
understanding
of prosody will also lead to more natural speech synthesis and,
possibly, more efficient speech recognition systems.
Further, GRToBI is based on an
analysis of Greek prosodic structure. This analysis, developed on the basis of
existing research and the annotation of the GRToBI
corpus itself, is the first
systematic description of Greek prosody and, as such, useful for theoretical
reasons.
Thus,
the GRToBI database and prosodic analysis system can be used for several
purposes:
by searching
the corpus;
GRToBI
annotated utterances as examples;
A GRToBI annotated file consists of an audio recording of
the utterance (in wav format) and an annotation file (currently in the PRAAT
textgrid format).
When the wav and textgrid files are opened together the
following appear: the waveform of the utterance, the spectrogram and pitch
contour of the utterance
(in Hz), as well as the five GRToBI annotation tiers (see Figure 1 for an
example):
For the intonational analysis of Greek utterances we recognize
three types of tonal events, pitch
accents, phrase
accents, and boundary
tones, and two levels
of phrasing, the intermediate phrase (ip) and the intonational
phrase (IP).
A pitch accent is a melody that is phonologically associated
with a metrically strong syllable; phonetically, a pitch accent co-occurs (more
or less) with the stressed
syllable it is phonologically associated with. Pitch
accents are “prominence cueing” (a term coined by Francis Nolan) in that they
indicate that the syllable with
which they co-occur is meant to be construed as being metrically
prominent (i.e. stressed). In Greek there is typically one prominent syllable
in each content word;
its position is lexically specified. Most function words do not
have prominent syllables (though they may carry orthographic accent), but some
do, such as the word
[katá] when it means ‘against’. In some cases, Greek
words may carry two pitch accents: this happens when a word is stressed on the
antepenult and is followed
by an enclitic (e.g. [to aftokínitó mu] ‘the car
my’), or when a word is stressed on the penult and is followed by two
enclitics (e.g. [fére mú to] ‘bring me it’). In such
cases, both stressed syllables can be accented; if there is only
one accent, this falls on the rightmost stressed syllable of the entire group.
Thus, in Greek we can
distinguish between unstressed syllables, such as [re] in
[fére mú to], stressed but
unaccented syllables, such as [fe] in [fere
mú to], and stressed and accented
syllables, such as [fé] and [mú] in [fére mú to].
Our research suggests that in Greek we can distinguish five pitch accents: H*,
L*, L*+H, L+H*, H* and H*+L. The typical distribution and phonetic realization
of
these accents is described below and shown in Figures 1, 2, 3, and 4. Each utterance has at least
one pitch accent, but typically more, since in Greek most stressed
syllables are also accented; thus, while an English phrase such
as Mary loves John is most likely uttered without an accent on loves,
in a similar Greek phrase, such
as [i ma'ria mi'lai sto 'mano] ‘Maria is talking to
Manos’, all three content words are accented. The last accent of an
utterance is called the nucleus. By default,
the nucleus falls on the last content word of an utterance (if
it is a declarative—for other types of utterance, see below), but it may
occur on an earlier word under
narrow or contrastive focus.
whereas H*
signals broad focus (Baltazani 2003). The H* accent lacks the initial dip
associated with the L tone of the L+H* (Arvaniti
et al. 2006b) and its peak is probably aligned
earlier in
the accented vowel, though quantitative data on this point are not yet
available.
Baltazani
& Jun 1999; Baltazani
2007b), and in the “suspicious” calling contour (that is, when
the vocative is to be interpreted as “is that you?”)
L tone is
aligned at or slightly before the onset of the accented syllable, and the H
tone is aligned at the beginning of the first post-accentual vowel (Arvaniti
& Ladd, 1995;
Arvaniti et al. 1998). The realization of L*+H
is different in contexts showing tonal crowding (Arvaniti et al. 2000).
lies
in the alignment of the H tone: the H tone of L+H* is well within the accented
vowel, whereas the H tone of the L*+H aligns early in the first post-accentual
vowel.
throughout
this syllable (see the last accent in Figure 9). In
terms of meaning, the use of H*+L conveys a sense of “stating the
obvious” that is, the implication that the addressee
should have
known or expected the answer.
At this
stage, it is not clear whether there is any particular meaning associated with
downstep, or with particular downstepped accents.
In GRToBI, three phrase accents, H-, L- and !H-, are assumed to
exist in Greek. As mentioned earlier, our analysis suggests that Greek has two
levels of phrasing,
the intermediate and intonational phrase (ip and IP
respectively), and it is assumed here that phrase accents demarcate the right
boundary of intermediate phrases.
This analysis is based on the following observations. Tones
associated with ips typically show simple F0 movements, unlike those associated
with IPs which can show complex
pitch configurations. Further there is a difference
in scaling between ips and IPs, in cases where both have similar pitch
movements, with ip boundaries exhibiting less extreme
F0 values than IP boundaries (i.e. a H- is scaled lower than a
H-H% configuration). In addition, the pauses after IPs (even non-final ones)
are longer and more frequent
than those for ips. Recent research also suggests that left ip
and IP boundaries are associated with “prosodic strengthening”
manifested as lengthening of ip and IP initial
consonants (Kastrinaki
2003). Non-final intermediate phrases typically have a H- or L- phrase
accent at their right edge. !H-, on the other hand, is used only in
certain types
of stylized intonation and then only in utterance-final ips
(i.e. it is always followed by a boundary tone).
More recently, Arvaniti et al. (2006a) have
shown that the melody of Greek polar questions is difficult to accommodate with
this inventory of phrase accents, and suggest
that the phrase accent of these questions is a bitonal L+H-. In addition,
this phrase accent does not always co-occur with the right edge of the
intermediate phrase it is
associated with; rather, when the nucleus of the polar question
is on a non-final word, the L+H- phrase accent aligns with the last stressed
syllable of the question. The
two patterns of alignment of the L+H- used in Greek polar
questions are illustrated in Figure 4.
GRToBI includes three types of boundary tone, H%, L%, and !H%.
These boundary tones demarcate the right edges of intonational phrases. They
combine with most phrase
accents into configurations which are frequently interpreted in
the ways shown below (though the list is only indicative; the interpretation of
a contour depends also on
the utterance and the context in which it is used).
|
L-L% |
declaratives, negative declaratives, imperatives,
wh-questions |
|
L-H% |
“involved” continuation rise,
“suspicious” calls |
|
H-L% |
yes-no questions, requesting calling contour (note:
according to Arvaniti et al. 2006a this
combination is L+H-L%) |
|
H-H% |
continuation rise, questioning calling contour |
|
L-!H% |
“involved” wh-questions, negative declaratives
(showing reservation), requesting imperatives |
|
H-!H% |
stylized continuation rise |
|
!H-!H% |
stylized call, incredulous questions |
|
!H-H% |
polite stylized call |
The Prosodic Words (PrWords) Tier is a phonetic
transcription using ASCII characters (see Appendix Ifor coventions). This tier
facilitates the analysis of sandhi
(connected speech phenomena, such as segment assimilations and
deletions across word boundaries), and fast speech rules, by encoding their
outcome. Like all
transcriptions, this tier has its limitations, and is not meant
to be a substitute for acoustic analysis; rather, it allows annotators to flag
instances of sandhi for
more detailed acoustic analysis. The PrWords Tier provides
information about stress, since this information cannot be deduced from the
transliteration in the
Words Tier or derived from a dictionary (e.g., as mentioned,
content Greek monosyllables, as well as some function words, are normally
stressed and pitch accented
in speech, but not in orthography; in contrast, disyllabic
function words are orthographically accented, but most do not normally carry
stress in speech). In this tier,
each prosodic word (defined as a sequence of items showing total
cohesion) is transcribed as one label.
This tier is a transliterated version of the text, equivalent to
the Orthographic Tier in the American English ToBI (see Appendix II for
conventions).
There are four break indices, 0, 1, 2, and 3. Break indices mark
cohesion (or the lack thereof) between constituents in an utterance.
that may bear
only one pitch accent (with the noted exception of hosts and clitics with two
accents). Several types of sandhi may occur across a BI 0 boundary,
however,
sandhi is not necessary for a BI 0 to be used. For example, a proclitic
particle like /na/ and the following verb are perceived as one PrWord by native
speakers,
but no sandhi can occur between /na/ and a consonant-initial verb.
all PrWords
following an early focus are de-accented; Baltazani
& Jun, 1999; Botinis, 1998). In general, if an item is accented, then it
should be considered
a separate
PrWord. On the other
hand, the absence of accent, as mentioned, does not constitute evidence that an
item is not an independent PrWord.
Four diacritics are used to provide more details on the prosodic
structure of utterances.
Greek (see
Nespor & Vogel, 1986; Kaisse, 1985; for a different point of view, see
Arvaniti, 1991, and results in Baltazani 2006b;
for a review of the relevant
literature,
see Arvaniti in press). Our corpus confirmed
previous studies that used naturally occurring data (e.g. Fallon, 1994) in
showing that sandhi can apply
across
larger constituents than postulated by, e.g., Nespor & Vogel (1986); see Figures 7 and 9 for sandhi across
PrWords, Figure 6 for sandhi across an ip
boundary.
Further annotation of the GRToBI corpus has also shown that some of the sandhi
rules of Greek are better described as gestural overlap
(Pelekanou, 2000; Arvaniti & Pelekanou 2002; Baltazani 2006b). Since the presence of sandhi does not necessarily signal
cohesion, we have decided to use the
diacritic s
for sandhi at all prosodic levels, and thus provide an easy way of searching
the database for such instances. We believe that the sandhi phenomena
will be
better understood if a large corpus of natural spoken data is investigated.
in which the
context for sandhi exists but nevertheless sandhi does not take place. For
example if a sequence like /ton 'pono/ ‘the pain. Acc.’ is
pronounced
[ton 'pono] it should be marked as 0m, since in general it should be pronounced
[to'mbono] or [to'bono]. The m diacritic should be
used with
BIs 1, 2, and 3 to mark the presence of a boundary without the tonal events
that normally accompany it.
This tier should be used for annotating non-structural
information that may be useful in interpreting the file, such as coughing,
disfluency, pitch halving or rate of speech.
|
Figure 1: This example (‘Do the flowers really
smell?’ lit. ‘the flowers smell?’), uttered in a surprised
manner, illustrates the different alignment of the L*+H (the accent on
['lludia]) and L+H* (the accent on [mi'rizune]). Note the difference in the
alignment of the H tone in the two accents. |
|
Figure 2:
This example (‘S/he is talking to Charalampos’) shows a
downstepped !H* nuclear accent. Note the lack of a dip at the beginning of
this accent and compare it to the L+H* in Figure 1. |
|
|
3a (left) |
|
Figure 4: This
illustration shows two typical L* accents on segmentally identical questions,
but with focus on the word [mi'rizune] ‘they smell’ on the left
and on the word [lu'ludia] ‘flowers’ on the right. The questions
mean ‘Do the flowers SMELL?’ and ‘Is it the FLOWERS that
smell?’ respectively. Note also the different alignment of the H-,
which is on (unstressed) [zu] in the question on the left, but the stressed
syllable [ri] in the question on the right. |
|
|
Figure 5: This example (gloss:
‘Dalida was scolding the baby [when she fainted]!’) exemplifies
the L-H% phrasal configuration, which is preceded in this case by a L+!H*
accent on [mo'ro]. Note also the undershot (wL*+H) and early aligned
(>L*+H) realizations of the two L*+H accents on ['malone] and [iDali'Da]
respectively. |
|
Figure 6: This example,
(‘The north wind and the sun agreed…’), illustrates the
difference in scaling between H- and H-H%. Further the example shows the
diphthongal realization of /o 'iLos/, which together with the
conjuction /ke/ ‘and’ form one PrWord with a rising diphthong
[oI], ['coILos]; note the alignment of the L* which aligns with the whole syllable
[coI]. Finally, the canonical alignment of the L*s of ['coILos] and
[si'mfonisan], which are manifested as low plateaus, can be contrasted with
the wL* of [mo'ro] in Figure 9. |
|
Figure 7: This example ([ta.meseona] ‘We do
not live in the Middle Ages’) illustrates the typical pattern of a
negative declarative expressing reservation. Note that the negative particle
/Den/, which is considered a phonological clitic, carries the nuclear (and
only) pitch accent of the utterance, and thus forms a separate pword from the
de-accented verb ['zume] ‘we live’; yet, sandhi (/n/-deletion
before the fricative [z]) does take place as well. The rest of the utterance
is deaccented, with the L- spreading until after the last stressed syllable
([se] of [me'seona] ‘Middle Ages’). Finally, compare the scaling
of the !H% (relative to that of the L+H* peak) to the scaling of the H- and
H% tones in Figures 5, 6 and 9 relatively to the accentual Hs in those examples. |
|
|
Figure 8: This example (gloss: ‘our focus is…’) illustrates the stylized H-!H% configuration on the word ['ine] ‘is’. Note also the presence of two accents on the word [epi'cedrosi] ‘focus’, which here is followed by the enclitic [mas] ‘ours’, and thus carries enclitic stress on its last syllable [si]. |
|
|
|
|
Figure 9: This example, (gloss:
‘Dalida was scolding the baby when the phone rang’), shows two
different realizations of L*+H under tonal crowding, >L*+H which is
realized earlier than it canonically would (the H tone is aligned with the
accented vowel, instead of the first postaccentual vowel), and wL*+H, in
which the L* tone is undershot, while the H shows the typical late alignment
of H in L*+H accents. In this utterance there is also an undershot L* (wL*)
on [mo'ro], realized as a rise from low pitch throughout the accented
syllable (cf. the canonical L*s in Figure 6). |
|
|
Figure 10: This example, (gloss:
‘[You] BECOME-PART of society through dance’) illustrates
de-accenting after early focus. Note also, the several instances of sandhi
and fast speech rules. |
Acknowledgments
The development of GRToBI largely took place while the first author was a
visitor at the Ohio State
University Linguistics Laboratory. We would like to thank
the members of the Laboratory,
particularly Mary Beckman, Julie McGory, Shu-hui Peng, Amanda Miller and
Mariapaola D’Imperio for their encouragement and
input during the development of
GRToBI, and also for long distance technical support. Thanks are also due to
Georgios Tserdanelis for providing wav files for this
site, and to the students in Prof.
Beckman’s and Dr McGory’s ToBI course for useful feedback at a
first presentation of GRToBI. We are grateful to Sun-Ah Jun
who brought us together, suggested
we develop the system and played devil's advocate at the early stages of its
gestation. Finally, we would like to express our
thanks to Jenny and Peter Ladefoged
for their kind hospitality to the first author during her stay at
II: Gesture, Segment, Prosody, 398-423.
Cambridge University Press.
University of Salzburg.
Typology:
The Phonology of Intonation and Phrasing, pp. 84-117. Oxford: Oxford
University Press.
Congress of Phonetic Sciences, 4: 220-23.
Stockholm.
Laboratory Phonology V.
·
Arvaniti, A., D. R. Ladd & I. Mennen (2006a). Phonetic effects of focus and “tonal
crowding” in intonation: Evidence from Greek polar questions.
questions. Speech Communication 48: 667-696.
·
Arvaniti, A., D. R. Ladd & I.
Mennen (2006b) Tonal association and tonal
alignment: evidence from Greek polar questions and emphatic statements.
statements. Language and Speech 49:
421-450.
Actes du 5e
Colloque international de linguistique grecque, pp. 71-74. Paris : L’Harmattan.
·
Baltazani,
M. (2002) Quantifier
scope and the role of intonation in Greek. Ph.D.
dissertation, UCLA.
·
Baltazani,
M. (2003) Broad Focus across sentence types in Greek. Proceedings of
Eurospeech-2003, Geneva, Switzerland.
·
Baltazani,
M. (2004) The prosodic structure of quantificational sentences in Greek. Proceedings
of the 38th meeting of the Chicago Linguistic Society.
·
Baltazani,
M. (2006a) On /s/-voicing in
Greek. Proceedings of the 7th International
Conference on Greek Linguistics, York, UK.
·
Baltazani,
M. (2006b) Focusing,
prosodic phrasing and Hiatus resolution in Greek. In Laboratory
Phonology 8, Luis Goldstein, Douglas Whalen, Catherine Best (eds.),
Mouton de Gruyter, Berlin,
473-494.
·
Baltazani,
M. (2006c) Intonation and
pragmatic interpretation of negation in Greek. Journal of Pragmatics,
38, Issue 10, p. 1658-1676, Elsevier.
·
Baltazani,
M. (2007a) Prosodic
rhythm and the status of vowel reduction in Greek. In Selected Papers on Theoretical and Applied
Linguistics from the 17th International
Symposium on Theoretical & Applied Linguistics, Volume 1,
Department of Theoretical and Applied Linguistics, Salonica, p. 31-43.
·
Baltazani,
M. (2007b) Intonation
of polar questions and the location of nuclear stress in Greek. In Tones and Tunes,Volume II, Experimental
Studies in Word and Sentence Prosody,
Carlos
Gussenhoven & Tomas Riad (eds.), Mouton de Gruyter, Berlin, p. 387-405.
·
Baltazani, M. & Jun S. (1999) Focus and topic intonation in
Greek. Proceedings of the XIVth International Congress of Phonetic Sciences,
2: 1305-8. San Fransisco.
Prosodic Typology: The Phonology of Intonation
and Phrasing, pp. 9-54. Oxford: Oxford University Press.
Themes in Greek Linguistics, 217-224.
London: John Benjamins Publishing Co.
Spoken
Language Processing, 867-879.
In
addition the following symbols and conventions should be used:
·
Noticeably centralized vowels should be transcribed as @.
·
Noticeably nasalized vowels should be transcribed with a
following ~; e.g. a~ .
·
In cases of vowel coalescence, both vowels should be transcribed
and joined by +; e.g. u+o resulting from a sequence of
/u/ and /o/ (usually across a word
boundary).
·
Whispered vowels should be transcribed in brackets.
·
Vowels that phonologically form separate syllables but are
phonetically manifested as a rising diphthong (on the basis,
e.g., of tonal alignment evidence), should
be transcribed with the second vowel capitalized; stress should be placed
before the diphthong.
·
Stress should be marked before the consonant(s) of the stressed
syllable, following IPA conventions. (At present we are
agnostic as to syllabification, so we suggest
that transcribers mark maximal onsets, unless tonal alignment or their own
intuitions suggests otherwise.)
separated by
fullstops; e.g. a.i.d’oni "nightingale".
mark in the
Words tier.
Home
At a Glance Research Publications CV Phonetics
Lab Linguistics
UCSD