Using Support Vector Machines Classifier fo Improve the Performance of Reinforcement Learning based Web Crawlers

سال انتشار: 1382
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 1,843

فایل این مقاله در 8 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

ACCSI09_098

تاریخ نمایه سازی: 4 بهمن 1386

چکیده مقاله:

The main contribution of this paper is introducing an approach for expanding the crawling methods of Cora spider, as a RL-based spider. We have introduced novel methods for calculating the Q-Value in reinforcement learning module of the spider. The proposed crawlers can find the target pages faster and earn more rewards over the crawl than Cora’s crawlers. We have used support Vector Machines (SVMs) classifier for the first time as a text learner in Web crawlers and compared the results with crawlers which use Naïve Bayes (NB) classifier for this purpose. The results show that crawlers using SVMs outperform crawlers which use NB in the first half of crawling a web site and find the target pages more quickly. The test bed for the evaluation of our approaches was Web sites of four computer science departments of four universities, which have been made available offline.

نویسندگان

Ahmad Abdollahzadeh Barfourosh

Computer Eng. & IT Faculty , Amirkabir University of Technology Tehran, Iran

Hamid Reza Motahari Nezhad

Computer Eng. & IT Faculty , Amirkabir University of Technology Tehran, Iran

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :
  • H. R. Motahari Nezhad, A. A. Barfourosh, Expanding Reinforcem ent ...
  • Chakrabarti S., Van Der Berg M., and Dom B., Focused ...
  • Chakrabarti S., Van Der Berg M. and Byron E. Dom, ...
  • McCallum A. K., Nigam K., Rennie J. and Seymore K., ...
  • Rennie J. and McCallum A., Using reinforcement learning to spider ...
  • McCallum A. and Nigam K., Rennie J. and Seymore K., ...
  • Diligenti M., Coetzee F. Lawrence S., Giles C. L. and ...
  • Raghavan S. and G arcia-Molina H., Crawling the Hidden Web, ...
  • Mukherjea S., WTMS: A System for Collecting and Analysing Topic-Specific ...
  • Menczer F., Pant G., Srinivasan P. and Ruiz M., Evaluating ...
  • Aggarwal C., Al-Garawi F. and Yu. P., Intelligent Crawling _ ...
  • Najork M., and Wiener J., Breadth-First Search Crawling Yields High-Quality ...
  • Chakrabarti S., Jaju R., Joshi M. and Punera K.., A ...
  • Chakrabarti S., Punera K.. and Subramanyam M., Accelerated Focused Crawling ...
  • Ehrig M., Ontology Focused Crawling of Documents and Relational Metadata, ...
  • De Bra P.., Houben G., Kornatzky Y. and Post R., ...
  • Kleinberg J., Authoritative sources in a hyperlinked environment, Proceedings of ...
  • T. Joachimes, D. Fritag, and T. Mitchel, WebWatcher: A tour ...
  • Sutton R. S., Barto A. G., Re informcement Learning: An ...
  • Torgo L. and Gama J., Regression using classification algorithms, Intelligent ...
  • Cortes C., Vapnik V., Supp ort-vector networks, Machine Learning, 20:273-297, ...
  • Vapnik Vladimir N., The Nature of Statistical Learning Theory, Springer, ...
  • Joachims T., Text Catego rization with Support Vector Machines: Learning ...
  • Joachims T., Transductive Inference for Text Classification using Support Vector ...
  • Lewis David D., Naive (Bayes) at forty: The independence assumption ...
  • نمایش کامل مراجع