Effective Learning to Rank for the Persian Web Content

Keyhanipour, Amir Hosein

doi:10.22059/jitm.2019.284726.2377

فهرست نشریات

فهرست نشریات دارای اعتبار وزارت علوم، تحقیقات و فناوری

فهرست مجلات علمی- پژوهشی دانشگاه تهران

نحوه ارسال مقاله برای مجله- ثبت نام در سامانه- فراموش کردن رمز عبور

تعداد نشریات	163
تعداد شماره‌ها	6,767
تعداد مقالات	72,892
تعداد مشاهده مقاله	132,131,576
تعداد دریافت فایل اصل مقاله	103,710,441

	Effective Learning to Rank for the Persian Web Content
Journal of Information Technology Management
مقاله 7، دوره 11، شماره 4، 2019، صفحه 92-109 اصل مقاله (881.26 K)
نوع مقاله: Research Paper
شناسه دیجیتال (DOI): 10.22059/jitm.2019.284726.2377
نویسنده
Amir Hosein Keyhanipour^*
Assistant Professor, Computer Engineering Department, Faculty of Engineering, College of Farabi, University of Tehran, Iran.
چکیده
Persian language is one of the most widely used languages in the Web environment. Hence, the Persian Web includes invaluable information that is required to be retrieved effectively. Similar to other languages, ranking algorithms for the Persian Web content, deal with different challenges, such as applicability issues in real-world situations as well as the lack of user modeling. CF-Rank, as a recently proposed learning to rank data, aims to deal with such issues by the classifier fusion idea. CF-Rank generates a few click-through features, which provide a compact representation of a given primitive dataset. By constructing the primitive classifiers on each category of click-through features and aggregating their decisions by the use of information fusion techniques, CF-Rank has become a successful ranking algorithm in English datasets. In this paper, CF-Rank is customized for the Persian Web content. Evaluation results of this algorithm on the dotIR dataset indicate that the customized CF-Rank outperforms baseline rankings. Especially, the improvement is more noticeable at the top of ranked lists, which are observed most of the time by the Web users. According to the NDCG@1 and MAP evaluation criteria, comparing the CF-Rank with the preeminent baseline algorithm on the dotIR dataset indicates an improvement of 30 percent and 16.5 percent, respectively.
کلیدواژه‌ها
Learning to rank؛ Persian language؛ CF-Rank algorithm؛ dotIR dataset؛ Information fusion

مراجع
AdvancedWebRanking. (2019, July). Google Organic CTR History, Fresh CTR averages pulled monthly from millions of keywords. Retrieved May 15, 2019, from https://www.advancedwebranking.com/ctrstudy/ Baeza-Yates, R. (2018). Bias on the Web. Communications of the ACM, 61(6), 54-61. Busa-Fekete, R., Kégl, B., Éltető, T., & Szarvas, G. (2013). Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers. Machine Learning, 93(2-3), 261–292. Cen, R., Liu, Y., Zhang, M., Zhou, B., Ru, L., & Ma, S. (2009). Exploring relevance for clicks. The 18th ACM conference on Information and knowledge management (pp. 1847-1850). ACM. Chapelle, O., & Chang, Y. (2011). Yahoo! Learning to Rank Challenge Overview. The Learning to Rank Challenge, (pp. 1-24). Darrudi, E., Hashemi, H. B., AleAhmad, A., Zare Bidoki, A., Habibian, A., Mahdikhani, F., & Rahgozar, M. (2009). dotIR collection for Persian web retrieval. University of Tehran. Retrieved May 15, 2019, from http://dbrg.ut.ac.ir/webir/files/Papers/WebIR.pdf Derhami, V., Khodadadian, E., Ghasemzadeh, M., & Zareh Bidoki, A. (2013). Applying reinforcement learning for web pages ranking algorithms. Applied Soft Computing, 1686–1692. Derhami, V., Paksima, J., & Khajeh, H. (2019). RRLUFF: Ranking function based on reinforcement learning using user feedback and web document features. AI and Data Mining. Retrieved May 15, 2019, from http://jad.shahroodut.ac.ir/article_1446.html Dou, Z., Song, R., Yuan, X., & Wen, J.-R. (2008). Are click-through data adequate for learning web search rankings? 17th ACM Conference on Information and Knowledge Management Conference (pp. 73-82). ACM. Filev, D., & Yager, R. R. (1994). Learning OWA operator weights from data. The Third IEEE Conference on Fuzzy Systems (pp. 468-473). IEEE. Hashemi, H. B., Yazdani, N., Shakery, A., & Naeini, M. P. (2010). Application of ensemble models in web ranking. The 5^th International Symposium on Telecommunications, (pp. 726-731). Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63-91. Joachims, T. (2002). Optimizing search engine using clickthrough data. The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 132-142). ACM. Keyhanipour, A., Moshiri, B., & Rahgozar, M. (2015). CF-Rank: Learning to rank by classifier fusion on click-through data. Expert Systems with Applications, 42, 8597-8608. Khodadadian, E., Ghasemzadeh, M., Derhami, V., & Mirsoleimani, A. (2012). A novel ranking algorithm based on reinforcement learning. The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012), (pp. 546-551). Liu, T.-Y. (2011). Learning to rank for information retrieval. Springer-Verlag. Macdonald, C., Santos, R. L., & Ounis, I. (2012). On the usefulness of query features for learning to rank. The 21st ACM International Conference on Information and Knowledge Management (pp. 2559-2562). ACM. Makvana, K., Patel, J., Shah, P., & Thakkar, A. (2018). Comprehensive analysis of personalized web search engines through information retrieval feedback system and user profiling. The International Conference on Advanced Informatics for Computing Research, (pp. 155-164). Manning, C. D., Raghavan, P., & Schütze, H. (2008). An introduction to information retrieval. Cambridge, England: Cambridge University Press. Mitchell, T. (1997). Machine Learning. McGraw Hill. Qin, T., & Liu, T.-Y. (2013). Introducing LETOR 4.0 datasets. arXiv, abs/1306.2597. Retrieved May 15, 2019, from http://arxiv.org/abs/1306.2597 Qin, T., Liu, T.-Y., Xu, J., & Li, H. (2007). LETOR: Benchmark dataset for research on learning to rank for information retrieval. The ACM SIGIR 2007 Workshop on Learning to Rank for Information Retrieval (pp. 3-10). ACM. W3Techs. (2019, July). Usage of content languages for websites. W3Techs. Retrieved May 15, 2019, from https://w3techs.com/technologies/overview/content_language/all Yager, R. R. (1988). On ordered weighted averaging aggregation operators in multi-criteria decision making. IEEE Transactions on Systems, Man and Cybernetic, 18, 183-190.
آمار تعداد مشاهده مقاله: 932 تعداد دریافت فایل اصل مقاله: 785

سامانه مدیریت نشریات علمی. قدرت گرفته از سیناوب

پیوندهای مفید

پیوندهای مفید

اخبار و اعلانات

آمار

Effective Learning to Rank for the Persian Web Content