شناسایی گویندگان در فضای مجازی: بررسی نقش آواهای سایشی بی‌واک در داده‌های صوتی واتساپ

داودی, راحیل; اسدی, هما

doi:10.22059/jolr.2025.389287.666909

فهرست نشریات

دوره ویراستاری

دوره آموزشی بهره‌گیری کاربردی از هوش مصنوعی در نگارش، داوری، ویرایش و انتشار آثار علمی برگزار می‌شود

دوره تخصصی آموزش ویراستاری برگزار می‌شود.

فهرست نشریات دارای اعتبار وزارت علوم، تحقیقات و فناوری

فهرست مجلات علمی- پژوهشی دانشگاه تهران

نحوه ارسال مقاله برای مجله- ثبت نام در سامانه- فراموش کردن رمز عبور

تعداد نشریات	127
تعداد شماره‌ها	7,127
تعداد مقالات	76,687
تعداد مشاهده مقاله	153,765,936
تعداد دریافت فایل اصل مقاله	115,805,476

	شناسایی گویندگان در فضای مجازی: بررسی نقش آواهای سایشی بی‌واک در داده‌های صوتی واتساپ
پژوهشهای زبانی
مقاله 5، دوره 16، شماره 1 - شماره پیاپی 30، شهریور 1404، صفحه 99-118 اصل مقاله (1.39 M)
نوع مقاله: مقاله پژوهشی
شناسه دیجیتال (DOI): 10.22059/jolr.2025.389287.666909
نویسندگان
راحیل داودی¹؛ هما اسدی^* ²
¹دانشجوی کارشناسی ارشد زبانشناسی رایانشی، گروه زبانشناسی، دانشگاه اصفهان، اصفهان، ایران.
²استادیار گروه زبانشناسی، دانشگاه اصفهان، اصفهان، ایران.
چکیده
در عصر دیجیتال، شناسایی دقیق گوینده در تحقیقات قضایی و امنیتی از اهمیت ویژه‌ای برخوردار است. با این‌ حال، گسترش ارتباطات مبتنی بر اینترنت و استفادۀ گسترده از پیام‌رسان‌هایی مانند واتس‌اپ، چالش‌های جدیدی را در این حوزه ایجاد کرده است. کیفیت متغیر میکروفون، نویز پس‌زمینه، اختلالات شبکه و فشرده‌سازی صوتی از جمله عواملی هستند که می‌توانند ویژگی‌های آکوستیکی گوینده را تحت تأثیر قرار دهند و دقت سیستم‌های شناسایی را کاهش دهند. علیرغم این محدودیت‌ها، بررسی عملکرد ویژگی‌های آکوستیکی در چنین شرایطی برای پیشبرد حوزۀ آواشناسی قضایی و بهبود کاربردهای عملی آن در محیط‌های واقعی ضروری است. این پژوهش به بررسی نقش آواهای سایشی بی‌واک در نشان‌دادن تغییرات بین‌گوینده‌ای در داده‌های صوتی ضبط‌شده از طریق پیام‌رسان واتس‌اپ می‌پردازد. نوآوری این پژوهش در بررسی توانایی آواهای سایشی بی‌واک زبان فارسی برای شناسایی گویندگان در شرایط ضبط غیرایده‌آل است. برای این منظور، داده‌های صوتی از ۱۰۰ گویشور مرد فارسی‌زبان جمع‌آوری شد و ضرایب کپسترال فرکانسی مل (MFCC) از زنجیرۀ آواهای سایشی بی‌واک استخراج شده و به‌عنوان ورودی به مدل ماشین بردار پشتیبان (SVM) وارد شدند. نتایج نشان داد که دقت مدل در تشخیص گوینده، زمانی‌ که تمامی آواهای سایشی بی‌واک به‌طور هم‌زمان در نظر گرفته شدند، ۶۹ درصد بوده است. با این‌ حال، بررسی جداگانۀ هر یک از آواهای سایشی، افزایش دقت مدل را نشان داد. در این میان، آوای سایشی /s/ با دقت ۷۷ درصد، بیشترین تأثیر را داشت. پس از آن، آواهای /ʃ/ و/x/ ، /f/ به‌ترتیب با دقت‌های ۷۵ درصد، ۷۴ درصد و ۷۳ درصد قرار گرفتند. این نتایج نشان می‌دهد که حتی در شرایط ضبط غیرایده‌آل، مانند داده‌های ضبط‌شده از طریق واتس‌اپ، آواهای سایشی بی‌واک می‌توانند اطلاعات ارزشمندی برای تمایز میان گویندگان ارائه دهند. با این‌ حال، این پژوهش تنها به یک نمونه از شرایط ضبط غیرایده‌آل پرداخته و بررسی سایر عوامل مخدوش‌کنندۀ بالقوه، نیازمند تحقیقات بیشتری است. یافته‌های این مطالعه، پتانسیل بالای آواهای سایشی بی‌واک را در کاربردهای شناسایی گوینده، به‌ویژه در سناریوهای غیررسمی، غیرکنترل‌شده و واقعی که فاقد تجهیزات ضبط باکیفیت هستند، نشان می‌دهد.
کلیدواژه‌ها
آواشناسی آکوستیکی؛ شناسایی گوینده؛ آواهای سایشی؛ ضرایب کپسترال فرکانسی مل؛ الگوریتم ماشین بردار پشتیبان

مراجع
اسدی، ه.، نوربخش م.، ساسانی ف.، تفاوت های بین-گوینده در سایشی‌های بی‌واک زبان فارسی. جستارهای زبانی. ۱۳۹۸؛ ۱۰ (۱) :۱۲۹-۱۴۷. اسدی، ه.، حسینی کیونانی، ن.، و نوربخش، م. (1394). بررسیِ تأثیر فراخوانی صورت بر ویژگی‌های آکوستیکی سایشی‌های بی‌واک زبان فارسی: پژوهشی در چارچوب آواشناسی قضایی. زبان‌شناسی و گویش‌های خراسان، 7(13)، 1-15. Boersma, P., & Weenink, D. (2025). Praat: Doing phonetics by computer (Version 6.4.26) [Computer software]. University of Amsterdam. http://www.praat.org Catford, J. C. (1977). Fundamental problems in phonetics. Edinburgh University Press. https://doi.org/10.2307/412751 Dellwo, V., Huchvale, M., & Ashby, M. (2007). How is individuality expressed in voice? An introduction to speech production and description for speaker classification. In C. Müller (Ed.), Speaker identification (Vol. 1, pp. 1–20). Springer. https://doi.org/10.1007/978-3-540-74200-5_1 Gold, E., & French, P. (2011). International practices in forensic speaker comparison. International Journal of Speech, Language and the Law, 18(2). https://doi.org/10.1558/ijsll.v18i2.293 Gordon, M., Barthmaier, P., & Sands, K. (2002). A cross-linguistic acoustic study of voiceless fricatives. Journal of the International Phonetic Association, 32(2), 141-174. https://doi.org/10.1017/S0025100302001020 Gouri, G., Sharma, A., & Sharma, V. (2024). Forensic speaker and gender identification from voice samples recorded through mobile phones and social media applications: A statistical and machine learning approach. Applied Acoustics, 222, 110074. https://doi.org/10.1016/j.apacoust.2024.110074 Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55 Jessen, M. (2008). Forensic phonetics. Language and Linguistics Compass, 2(4), 671–711. https://doi.org/10.1111/j.1749-818x.2008.00066.x Jongman, A., Wayland, R., & Wong, S. (2000). Acoustic characteristics of English fricatives. The Journal of the Acoustical Society of America, 108(3), 1252-1263. https://doi.org/10.1121/1.1288413 Karpisek, F., Baggili, I., & Breitinger, F. (2015). WhatsApp network forensics: Decrypting and understanding the WhatsApp call signaling messages. Digital Investigation, 15, 110-118. https://doi.org/10.1016/j.diin.2015.09.002 Kavanagh, C. (2012). New consonantal acoustic parameters for forensic speaker comparison (Doctoral dissertation, University of York). https://etheses.whiterose.ac.uk/id/eprint/3980/ Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12-40. https://doi.org/10.1016/j.specom.2009.08.009 Kisler, T., Reichel, U., & Schiel, F. (2017). Multilingual processing of speech via web services. Computer Speech & Language, 45, 326–347. https://doi.org/10.1016/j.csl.2017.01.005 Lee, Y., Keating, P., & Kreiman, J. (2019). Acoustic voice variation within and between speakers. The Journal of the Acoustical Society of America, 146(3), 1568–1579. https://doi.org/10.1121/1.5125134 Lindh, J. (2017). Forensic comparison of voices, speech and speakers: Tools and methods in forensic phonetics. University of Gothenburg. Retrieved from https://gupea.ub.gu.se/handle/2077/52188 McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 51–56). https://pandas.pydata.org/ Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. https://scikit-learn.org/stable/ Reetz, H., & Jongman, A. (2009). Phonetics: Transcription, production, acoustics, and perception (1st ed.). Wiley-Blackwell. Rose, P. (2002). Forensic speaker identification. New York: Taylor & Francis. Schindler, C., & Draxler, C. (2013). Using spectral moments as a speaker-specific feature in nasals and fricatives. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7849-7853). https://doi.org/10.21437/Interspeech.2013-639 Shadle, C. H. (1990). Articulatory-acoustic relationships in fricative consonants. In W. J. Hardcastle & A. Marchal (Eds.), Speech production and speech modelling (pp. 187–209). Springer. http://eprints.soton.ac.uk/id/eprint/250178 Smorenburg, L., & Heeren, W. (2020). The distribution of speaker information in Dutch fricatives /s/ and /x/ from telephone dialogues. The Journal of the Acoustical Society of America, 147(4), 2554-2567. https://doi.org/10.1121/10.0000674 Statista Research Department. (2021). Most popular global mobile messaging apps 2021. Retrieved October 16, 2023, from https://www.statista.com/statistics/258749/most-popular-global-mobile-messenger-apps/ Statista Research Department. (2023). Number of unique WhatsApp mobile users worldwide from January 2020 to June 2023. Retrieved October 16, 2023, from https://www.statista.com/statistics/1306022/whatsapp-global-unique-users/ Stuart-Smith, J. (2007). Empirical evidence for gendered speech production: /s/ in Glaswegian. In J. Cole & J. Hualde (Eds.), Change in phonology: Papers in laboratory phonology. Mouton de Gruyter. https://eprints.gla.ac.uk/8985/ Temko, A., & Nadeu, C. (2005). Classification of acoustic events using SVM-based clustering schemes. TALP Research Center, Universitat Politècnica de Catalunya. https://upcommons.upc.edu/bitstream/handle/2117/2065/classification.pdf?sequence=3 ¹ Ulrich, N., Pellegrino, F., & Allassonnière-Tang, M. (2023). Intra- and inter-speaker variation in eight Russian fricatives. The Journal of the Acoustical Society of America, 135(4), 2098-2109. https://doi.org/10.1121/10.0017827
آمار تعداد مشاهده مقاله: 520 تعداد دریافت فایل اصل مقاله: 141

سامانه مدیریت نشریات علمی. قدرت گرفته از سیناوب

پیوندهای مفید

پیوندهای مفید

اخبار و اعلانات

آمار

شناسایی گویندگان در فضای مجازی: بررسی نقش آواهای سایشی بی‌واک در داده‌های صوتی واتساپ