The Acoustic Architecture of Invisible Phonology: Challenges in the Symbolization and Detection of Non-Segmental Linguistic Shifts
March 17, 2026
Max Barrett
MaximillianGroup
Califonia, United States
The mapping of human speech onto symbolic systems has traditionally prioritized segmental phonology, where discrete consonants and vowels serve as the primary units of meaning. However, a significant subset of the world’s languages employs "invisible" sound shifts—modulations in voice quality, nasal resonance, duration, and airflow mechanisms—that function phonemically but evade traditional orthographic representation. These shifts represent a departure from the linear concatenation of phonemes, instead utilizing the texture, timing, and aerodynamic source of the signal to distinguish lexical identity. The complexity of these phenomena presents a formidable challenge for both orthography and computational detection, particularly when attempting to normalize signals across diverse speaker populations including men, women, and children.
Voice quality, or phonation, refers to the physiological state of the larynx during sound production. While modal voice—characterized by efficient vocal fold vibration with normal tension—is the cross-linguistic baseline, languages such as Gujarati and Jalapa Mazatec utilize deviations from this norm to differentiate words. These shifts are "invisible" because the primary place and manner of articulation often remain identical, while the laryngeal settings vary.
In Gujarati, breathy phonation, often referred to as "murmur," distinguishes words that are otherwise homophonous. A classic minimal triplet is found in /baɾ/ (twelve), /ba̤ɾ/ (outside), and /bʱaɾ/ (burden).1 The acoustic distinction between a modal vowel and a breathy vowel lies primarily in the relationship between the first and second harmonics (). Breathy voice involves a more open glottis, where the vocal folds do not close completely or remain open for a larger portion of the glottal cycle. This manifests as a higher value, reflecting a larger open quotient (OQ).1
Visualizing this shift in a spectrogram reveals a significant increase in spectral tilt, measured by , where is the amplitude of the third formant. In breathy vowels, the higher frequencies are significantly dampened relative to the fundamental frequency.1 Furthermore, periodicity is disrupted; the Cepstral Peak Prominence (CPP), a measure of how clearly harmonics emerge from background noise, "dips" significantly at the midpoint of the breathy vowel. This indicates that while the vowel starts and ends with more modal-like characteristics, the center of the segment is characterized by increased laryngeal noise and aperiodicity.1
The symbolization of this shift in Gujarati orthography is historically rooted in the inter-syllabic /h/, but in modern spoken Gujarati, it is often realized as a single breathy vowel [V̤].2 The difficulty for the writer lies in the continuum of production; a speaker might produce a full [VɦV] sequence in formal registers but a subtle [V̤] in connected speech, making a standardized symbol difficult to implement without imposing an artificial rigidness on the phonological reality.2
Jalapa Mazatec utilizes a three-way contrast between modal, breathy, and creaky voice. Creaky voice, or vocal fry, involves high adductive tension and low longitudinal tension, resulting in a constricted glottis with thick, slow-moving vocal folds.4 Unlike the "sighing" quality of Gujarati breathy voice, Mazatec creaky voice is characterized by a low fundamental frequency () and extreme irregularity. Individual glottal pulses often become audible, creating a "percept of roughness".5
Acoustically, creaky voice is identified by a lower value compared to modal voice, signaling the increased glottal constriction and a smaller open quotient.5 Spectral noise is higher across all frequency bands, but specifically in the Harmonic-to-Noise Ratio (HNR) at lower frequencies. In Mazatec, this laryngealization can be accompanied by high tones, which creates a "tense voice" variant where the is high but the glottis remains constricted.5 This complicates machine detection, as the system cannot rely on low pitch alone to identify creak; it must instead look for the spectral slope and harmonic irregularity markers.5
Nasalization occurs when the velopharyngeal port is opened, coupling the nasal cavity with the oral tract. This introduces a complex set of poles (resonances) and zeros (anti-resonances) into the acoustic signal, which can vary significantly based on the degree of coupling and the specific vowel being nasalized.6
In French, the contrast between /bo/ (beau) and /bɔ̃/ (bon) is entirely dependent on this shift. The acoustic visualization of French nasal vowels shows that they are not merely "nasalized" versions of their oral counterparts; they often involve secondary oral adjustments. For instance, the nasal vowel /ɛ̃/ is consistently lower and more back than the oral /ɛ/.8
The primary acoustic marker for French nasalization is the widening of the first formant () bandwidth and a general rising of the frequency.6 This occurs because the nasal tract acts as a side-branch resonator that absorbs energy and shifts the oral resonances. In many French nasal vowels, a stable nasal formant () appears around 900 Hz, while a nasal antiformant () can appear near the oral , potentially canceling it out.6
Orthographically, French uses the letters "n" and "m" to signal nasality, but this system is inconsistent. In words like chant or fin, the nasal consonant is silent, serving only as a diacritic for the vowel. This creates confusion for learners and speech recognition systems alike, as the "invisible" nasal feature must be inferred from a letter that, in other contexts, represents a full consonant.10
Guaraní presents one of the most sophisticated examples of "nasal spreading," where nasality is not confined to a single segment but acts as a prosodic feature of the entire word.12 This spreading is triggered by a stressed nasal vowel and propagates bidirectionally until it hits a blocker—usually an oral stressed vowel.14
A major point of linguistic debate in Guaraní is the behavior of voiceless stops (/p, t, k/), which have been traditionally labeled as "transparent" because they allow the nasal feature to "skip" over them without themselves becoming voiced nasal stops.12 However, modern acoustic and aerodynamic studies reveal that these stops are "partial undergoers." While the closure remains largely oral to maintain the high pressure required for a stop, there is evidence of nasal airflow energy at the onset of the closure.16 Furthermore, the Voice Onset Time (VOT) for /p/ and /t/ in Guaraní is shifted in nasal environments, suggesting that the "invisible" nasality is indeed affecting the temporal coordination of the stop, even if it is not phonetically realized as a nasal consonant.15
Holding a sound for a fraction of a second longer is a common way to mark emphasis, but in languages like Finnish and Estonian, it is a phonemic requirement. Estonian is uniquely complex due to its three-way quantity system: short (Q1), long (Q2), and overlong (Q3).17
The distinction between Q2 and Q3 in Estonian is particularly difficult to visualize because it is not just a measure of absolute duration; it is a feature of the entire disyllabic foot.19 In a Q1 foot like sada (hundred), the first syllable is short and the second is relatively long. In a Q2 foot like saada (to get), the first syllable is long and the second is short. In a Q3 foot like saada (send!), the first syllable is even longer and is often accompanied by a distinct falling pitch contour.21
The symbolization of this system in Estonian orthography is notably insufficient. While Q1 is marked with a single letter and Q2 with a double letter, Q3 is generally not distinguished from Q2 in writing (except for stop consonants).18 For example, the spelling koolis can represent both the inessive singular (Q2) and another form in Q3, requiring the reader to rely entirely on syntactic context. This "orthographic gap" exists because the difference between Q2 and Q3 involves prosodic cues—specifically the tonal drop and the extreme syllable ratio—that standard Latin-based alphabets are not equipped to capture.23
Click consonants, prevalent in Xhosa and other Nguni languages, are produced using a lingual ingressive airstream mechanism. This involves two closures: an "initiatory" closure at the velar or uvular position and an "articulatory" closure at the dental, alveolar, or lateral position.25 The rarefaction of air between these two points creates a suction that, when released, produces the characteristic click burst.
Clicks are described as "unencoded speech" because they do not coarticulate with their phonetic environment in the same way that pulmonic sounds do.26 Unlike a "t" or "p" which leaves "transitional features" on the following vowel (formant transitions), a click release is almost entirely self-contained. Spectrograms of Xhosa clicks show very distinct noise-burst properties:
Dental Clicks [ǀ] (orthographic 'c'): These have a diffuse spectrum with energy spread across a wide range (0-9000 Hz) but at a lower overall amplitude.27 They are often described as "affricative" because the release is more gradual.28
Palatal Clicks [ǃ] (orthographic 'q'): These are "instantaneous" and "compact." The spectral energy is concentrated in lower frequencies, typically between 1000 and 1700 Hz.27
Lateral Clicks [ǁ] (orthographic 'x'): These have a diffuse spectrum but with a distinct peak between 1000-2000 Hz, reflecting the resonance of the lateral side-cavities.27
Because standard Latin letters were never designed for lingual ingressive sounds, the symbols "c," "x," and "q" were arbitrarily assigned to these sounds in Xhosa. This creates a disconnect for non-native speakers who associate these letters with their European values, further obscuring the "invisible" mechanics of the click.31
Whistle languages represent the most radical linguistic shift, where spoken Turkish is transposed into a series of frequency modulations.33 In the village of Kuşköy, "Bird Language" is used for long-distance communication where traditional vocalizations would be lost to ambient noise and distance.35
In whistled Turkish, the vocal cords remain inactive, and the whistle acts as a "pure tone" carrier wave. The frequency is modulated by changing the volume of the resonant oral cavity, primarily through the anteroposterior movement of the tongue.35 The whistled signal essentially emulates the second formant () of spoken Turkish, as is the primary carrier of vowel identity in non-tonal languages.37
However, this shift necessitates a significant phonetic reduction. Spoken Turkish’s 32 phonemes are condensed into approximately six whistled phonetic groups. For instance, the bilabial stops /p/ and /b/ are merged into the sound /f/ because the lips cannot close during whistling without stopping the sound.35
Symbolizing a whistle language is virtually impossible with a standard alphabet because the signal is continuous and lacks the discrete boundaries of spoken phonemes. While researchers use frequency intervals to categorize these sounds, the native whistlers rely on a "right-brain" encryption mechanism that decodes the melody and rhythm rather than discrete letters.35
Detecting these subtle sound shifts in children, women, and men requires a robust normalization process. The "lack of invariance" in speech means that a man's breathy voice might have an similar to a woman's modal voice, making absolute frequency thresholds useless.38
To detect "invisible" shifts across speaker types, machines must follow a specific logic pathway that isolates linguistic intent from physiological variation:
Signal Acquisition and Pre-processing: The raw audio is sampled (typically at 16-44.1 kHz) and divided into short frames (20-30 ms).7
Vocal Tract Length Normalization (VTLN): The system estimates the length of the speaker's vocal tract and applies a warping factor () to the frequency axis. This "scales" the formants as if they were produced by a reference tract, helping to normalize the difference between a child's small tract and a man's large one.39
Feature Extraction: Mel-frequency cepstral coefficients (MFCCs) are extracted to capture the spectral envelope. For shifts like phonation or nasality, higher-level parameters are added:
Phonation: , , and CPP.1
Nasalization: bandwidth and .9
Duration: Syllable and segment ratios.19
Temporal Trajectory Analysis: Instead of a single measurement, the machine evaluates the shift across the duration of the segment (e.g., at 10%, 50%, and 90% marks). This is crucial for Guaraní nasal spreading or Gujarati midpoint breathiness.7
Classification: These normalized features are fed into a Support Vector Machine (SVM) or an XGBoost model. By training on multi-speaker datasets, the model learns the "boundary" of the shift independent of the speaker's .7
Research in child speech normalization indicates that Z-score standardization—scaling features based on age-and-sex-specific means—significantly improves detection.44 For instance, a child's "modal" voice naturally contains more noise than an adult's. A machine must be "taught" that a higher in a child may be their baseline, whereas the same value in an adult male would signal breathiness.44
The challenge of "invisible" sound shifts highlights a fundamental limitation of human orthography: it is designed for ease of reading, not acoustic precision. Standard alphabets are "selective," ignoring sounds that do not change meaning in the designer's language and simplifying those that do for the sake of usability.45 This creates a "symbolic gap" where the most complex and nuanced aspects of human speech—the sigh of a Gujarati breathy vowel, the suction of a Xhosa click, or the melody of a Turkish whistle—are rendered invisible.
Computational linguistics provides a bridge across this gap. By utilizing VTLN, harmonic-to-noise ratios, and spectral tilt measurements, machines can "hear" the shifts that orthography ignores. However, the ultimate normalization remains a human cognitive process. Whether through "multiple-listing" of phonetic variants in memory or "top-down" parsing using semantic context, the human ear remains the most sophisticated detector of these invisible phonological architectures.38 For the field of linguistics, the ongoing task is to refine these digital models until they can mirror the human ability to find constancy in a signal that is constantly shifting.
Works cited
Breathy phonation in Gujarati: an acoustic and electroglottographic ..., accessed March 15, 2026, https://www.reed.edu/linguistics/khan/assets/ASA2010-Khan.pdf
Distinguishing breathy consonants and vowels in Gujarati | Reed College, accessed March 15, 2026, https://www.reed.edu/linguistics/khan/assets/Esposito%20ea%202019%20Distinguishing%20breathy%20consonants%20and%20vowels%20in%20Gujarati.pdf
Acoustic and Electroglottographic Study of Breathy and Modal Vowels as Produced by Heritage and Native Gujarati Speakers - ISCA Archive, accessed March 15, 2026, https://www.isca-archive.org/interspeech_2017/nara17_interspeech.pdf
Voice-quality contrasts, accessed March 15, 2026, https://www.phonetik.uni-muenchen.de/~hoole/kurse/phil_demos/artikul/sowl/gujarati/voice_quality_contrasts.html
44"x68" poster template, accessed March 15, 2026, https://idiom.ucsd.edu/~mgarellek/files/Keating_Garellek_2015_LSA.pdf
DOCUMENT RESUME Acoustic Aspects of French Nasal Vowels. 20p. - ERIC, accessed March 15, 2026, https://files.eric.ed.gov/fulltext/ED119525.pdf
AUTOMATIC MEASUREMENT AND COMPARISON OF VOWEL ..., accessed March 15, 2026, https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2011/OnlineProceedings/RegularSession/Yuan/Yuan.pdf
(PDF) French nasal vowels: acoustic and articulatory properties - ResearchGate, accessed March 15, 2026, https://www.researchgate.net/publication/221484514_French_nasal_vowels_acoustic_and_articulatory_properties
On the Acoustical and Perceptual Features of Vowel Nasality, accessed March 15, 2026, https://wstyler.ucsd.edu/talks/defense_huge_handout.html
More Harm than Good: Why Dictionaries Using Orthographic Transcription Instead of the IPA Should Be Handled with Care - ResearchGate, accessed March 15, 2026, https://www.researchgate.net/publication/366672289_More_Harm_than_Good_Why_Dictionaries_Using_Orthographic_Transcription_Instead_of_the_IPA_Should_Be_Handled_with_Care
The Effects of Orthography on the Pronunciation of Nasal Vowels by L1 Japanese Learners of L3 French: Evidence from a Longitudinal Study of Speech in Interaction - MDPI, accessed March 15, 2026, https://www.mdpi.com/2227-7102/14/3/234
1 Guaraní Voiceless Stops in Oral versus Nasal Contexts - Bruce Hayes, accessed March 15, 2026, https://brucehayes.org/251VowelHarmony/Readings/Walker_Guarani.pdf
UNIVERSITY OF CALIFORNIA Los Angeles Nasal harmony in Paraguayan Guarani - eScholarship.org, accessed March 15, 2026, https://escholarship.org/content/qt75s541nd/qt75s541nd_noSplash_ad1f65f59d8f1c6f9fb3fb23e2c53275.pdf
Nasal spreading in Paraguayan Guaraní: Introducing long-distance continuous spreading1 - Amerindia, accessed March 15, 2026, https://amerindia.cnrs.fr/wp-content/uploads/2021/02/Kaiser-E.-Nasal-spreading-in-Paraguayan-Guarani%CC%81-Introducing-long-distance-continuous-spreading.pdf
Thesis final 4.2up, accessed March 15, 2026, https://roa.rutgers.edu/files/405-0800/roa-405-walker-5.pdf
PIAROA VOICELESS STOPS AS PARTIAL UNDERGOERS OF NASAL HARMONY - NSF PAR, accessed March 15, 2026, https://par.nsf.gov/servlets/purl/10448581
Distinctive duration of speech sounds in Estonian - Journal.fi, accessed March 15, 2026, https://journal.fi/fuf/article/download/114298/67464
Estonian Language - Structure, Writing & Alphabet - MustGo.com, accessed March 15, 2026, https://www.mustgo.com/worldlanguages/estonian/
Feet, syllables, moras and the Estonian quantity system - ROA, accessed March 15, 2026, https://roa.rutgers.edu/content/article/files/1266_prillop_1.pdf
Variation in vowel quality as a feature of Estonian quantity - ISCA Archive, accessed March 15, 2026, https://www.isca-archive.org/speechprosody_2010/lippus10_speechprosody.html
THE PRODUCTION OF ESTONIAN VOWELS IN THREE QUANTITY DEGREES BY SPANISH L1 SPEAKERS, accessed March 15, 2026, https://assta.org/proceedings/ICPhS2019Microsite/pdf/full-paper_732.pdf
ON THE QUANTITY AND QUALITY . OF ESTONIAN VOWELS OF THREE .. PHONOLOGICAL DEGREES OF LENGTH, accessed March 15, 2026, https://www.coli.uni-saarland.de/groups/FK/speech_science/icphs/ICPhS1961/p4.1_682.pdf
Estonian has short, long and over-long vowels. Where can I hear a recording of the over-long vowels? - Quora, accessed March 15, 2026, https://www.quora.com/Estonian-has-short-long-and-over-long-vowels-Where-can-I-hear-a-recording-of-the-over-long-vowels
How do you distinguish between long and overlong vowels in Estonian? - Talkpal, accessed March 15, 2026, https://talkpal.ai/culture/how-do-you-distinguish-between-long-and-overlong-vowels-in-estonian/
Phonetic Analysis of Clicks, Plosives and Implosives of IsiXhosa: A Preliminary Report - Florida Online Journals, accessed March 15, 2026, https://journals.flvc.org/floridalinguisticspapers/article/view/107121/102444
NOTES ON UNENCODED SPEECH: - CLICKS AND THEIR ... - ASSTA, accessed March 15, 2026, https://assta.org/proceedings/sst/SST-96/cache/SST-96-Chapter3-p21.pdf
A /k!e/ {, 4 12 16 i, accessed March 15, 2026, https://www.coli.uni-saarland.de/groups/FK/speech_science/icphs/ICPhS1991/12_ICPhS_1991_Vol_4/p12.4_130.pdf
ICPhS 95 Stockholm, accessed March 15, 2026, https://www.coli.uni-saarland.de/groups/FK/speech_science/icphs/ICPhS1995/13_ICPhS_1995_Vol_2/p13.2_574.pdf
Contrastive Lateral Clicks and Variation in Click Types - ISCA Archive, accessed March 15, 2026, https://www.isca-archive.org/icslp_2000/millerockhuizen00_icslp.pdf
PHONETIC REDUCTION OF CLICKS – EVIDENCE FROM NǀUU, accessed March 15, 2026, https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS0240.pdf
How to do the Xhosa clicks - Linguistics Stack Exchange, accessed March 15, 2026, https://linguistics.stackexchange.com/questions/29431/how-to-do-the-xhosa-clicks
IPA | Idea Wiki | Fandom, accessed March 15, 2026, https://ideas.fandom.com/wiki/IPA
Typology and acoustic strategies of whistled languages: Phonetic comparison and perceptual cues of whistled vowels | Journal of the International Phonetic Association | Cambridge Core, accessed March 15, 2026, https://www.cambridge.org/core/journals/journal-of-the-international-phonetic-association/article/typology-and-acoustic-strategies-of-whistled-languages-phonetic-comparison-and-perceptual-cues-of-whistled-vowels/03100623102D31723CA6BE881E765958
Acoustic and Linguistic Properties of Turkish Whistle Language - Scirp.org., accessed March 15, 2026, https://www.scirp.org/journal/paperinformation?paperid=86259
The Culture and Language of Whistle of Turkish People (Giresun), accessed March 15, 2026, https://www.ijscl.com/article_241830_f72c2559962fc43ddeda3036463cb5f3.pdf
Comparative Analysis of Early Studies on Turkish Whistle Language and a Case Study on Test Conditions - SCIRP, accessed March 15, 2026, https://www.scirp.org/journal/paperinformation?paperid=86507
Acoustic and Linguistic Properties of Turkish Whistle Language - SciSpace, accessed March 15, 2026, https://scispace.com/pdf/acoustic-and-linguistic-properties-of-turkish-whistle-3u8w0tvduy.pdf
(PDF) Talker normalization: Phonetic constancy as a cognitive process - ResearchGate, accessed March 15, 2026, https://www.researchgate.net/publication/248579975_Talker_normalization_Phonetic_constancy_as_a_cognitive_process
Speaker Normalization in Speech Perception Keith Johnson Matthias Sjerps 1 Introduction Talkers differ from each other in a grea - eScholarship, accessed March 15, 2026, https://escholarship.org/content/qt2fc6x1ph/qt2fc6x1ph_noSplash_532045dda339e6de723dfdc32288e108.pdf
Vocal Tract Length Normalization for Speaker Independent Acoustic-to-Articulatory Speech Inversion - ISCA Archive, accessed March 15, 2026, https://www.isca-archive.org/interspeech_2016/sivaraman16_interspeech.html
Acoustic Parameters for the Automatic Detection of Vowel Nasalization - ISCA Archive, accessed March 15, 2026, https://www.isca-archive.org/interspeech_2007/pruthi07_interspeech.pdf
An investigation of the dynamics of vowel nasalization in Arabana using machine learning of acoustic features - Laboratory Phonology, accessed March 15, 2026, https://www.journal-labphon.org/article/9152/galley/22803/view/
An investigation of the dynamics of vowel nasalization in Arabana using machine learning of acoustic features | Laboratory Phonology, accessed March 15, 2026, https://www.journal-labphon.org/article/id/9152/
Evaluating acoustic representations and normalization for rhoticity classification in children with speech sound disorders - Montclair State University Digital Commons, accessed March 15, 2026, https://digitalcommons.montclair.edu/cgi/viewcontent.cgi?article=1168&context=communcsci-disorders-facpubs
International Phonetic Alphabet - Wikipedia, accessed March 15, 2026, https://en.wikipedia.org/wiki/International_Phonetic_Alphabet
Here is why we can't adopt IPA as a writing system. - The Language Nerds |, accessed March 15, 2026, https://thelanguagenerds.com/2019/what-if-we-decided-to-write-english-in-ipa-instead-of-the-current-alphabet/