Design and Implementation of Search Engine System
CHAPTER ONE
AIM AND OBJECTIVES OF STUDY
The aim of the study is to design and implement a student research search engine. The objective of this study is to develop an application that will ease the stress associated with the existing manual system as earlier stated, while focusing on a research search system. This study aims to achieve the following;
- To create a database management system (DBMS) which store research material and provides security for the stored data.
- To provide quick and efficient means retrieving student research/projects material through a search system
- At the completion of this work, this system will improve the management of a research library which can be uses as a model in building digital research library.
CHAPTER TWO
LITERATURE REVIEW
INTRODUCTION
Site-specific search engine has become increasingly important for the web sites of education institutes, government and private companies because it provides more detailed information that general search engine usually can’t offer.
In order to facilitate user’s surfing experience on their web sites, many institutes either license the searching tools from general search engine companies such as Google or create their own primitive search engines. Neither of these two options is ideal in some case because licensing search engine usually costs a lot of money and the searching quality of self-created primitive search engine is not satisfactory due to the lack of the expertise of applying modern technologies of building search engine, although these kind of technologies are available to public due to many researcher’s hard work in the field of information retrieval in several decades. So the research in building a low cost or free site-specific search engine that can produce detailed and satisfactory results in response to user’s query has a wide and practical application as more and more information are put on the internet.
As an experiment of building a robust, reliable and capable site-specific search engine, a simple search engine for web site of Ogun State Institute of Technology, Igbesa has been built using the technologies explored in the information and retrieved from journals and internet downloadable materials. This section is set aside to give analysis of these journals.
HISTORY OF SEARCH ENGINE AND DEVELOPMENT OF SRS
In the summer of 1993, no search engine existed for the web, though numerous specialized catalogues were maintained by hand. Oscar Nierstrasz at the University of Geneva wrote a series of Perl scripts that periodically mirrored these pages and rewrote them into a standard format. This formed the basis for W3Catalog, the web’s first primitive search engine, released on September 2, 1993. The web’s second search engine Aliweb appeared in November 1993. One of the first “all text” crawler-based search engines was WebCrawler, which came out in 1994. Google adopted the idea of selling search terms in 1998, from a small search engine company named goto.com. Around 2000, Google’s search engine rose to prominence (Chakkrit, 2007).
The company achieved better results for many searches with an innovation called PageRank. By 2000, Yahoo! was providing search services based on Inktomi’s search engine. Yahoo! acquired Inktomi in 2002, and Overture (which owned Allthe Web and AltaVista) in 2003. Yahoo! switched to Google’s search engine until 2004, when it launched its own search engine based on the combined technologies of its acquisitions. Microsoft’s rebranded search engine, Bing, was launched on June 1, 2009. On July 29, 2009, Yahoo! and Microsoft finalized a deal in which Yahoo! Search would be powered by Microsoft Bing technology. By the passing of time the use of search engine is increasing. As increased use of search engine for searching information, a system has been developed that helps users to search information. When a person wants to search anything he simply places his words in search engine. Then search engine returns him relevant information according to his/her words based on many more criteria. But user has to extract their necessary information after doing much analysis as search engines can’t give the exact information manually. This makes searching for any information very time consuming. Then we thought that we may develop search reporter system so that users may search and get any information manually and which is not time consuming. Search engines use many criteria such as SEO (Search Engine Optimization), searching and returning information but we choose primarily only the words that are given for searching. On the stage of developing the SRS at first, admin places a keyword in the field that is defined for him. Then the system which is connected to any search engine such as Google will get all the titles, URLs and descriptions and then check and count the keyword in the web pages of the URLs. Then the titles, URLs, descriptions, number of matches of the keyword and an associated id against the keyword are stored in database. Now, if a user searches for a keyword, then the search reporter loads the search results from database and then ranking the pages based on its highest number of matching keywords. As search engines updated their information day by day, so the admin needs to update the database of SRS day by day so that user gets the updated information from the SRS.
CHAPTER THREE
RESEARCH METHODOLOGY
INTRODUCTION
The latest demanding tools and technologies such as HTML, PHP, CSS, JavaScript, MySQL database, and Apache web server have been used to develop the SRS. More significantly, a phonetic algorithm namely Metaphone algorithm has been used for searching in SRS to ignore the spelling error in the searching keywords.
It is an algorithm for indexing of words by their pronunciation. The main advantage of phonetic algorithm is to eliminate misspelling of words. When user search for a keyword than it may happen that he/she will place misspell of his desired keyword. To solve this problem we use Metaphone algorithm in this system which produce phonetic similarity of different alphabets or group of alphabets to avoid the loss of information. Here, users will give search keywords and actual data are matched by its phonetic similarity.
ANALYSIS OF THE EXISTING SYSTEM
Students currently go online to search for research materials for their projects. The current system takes time and sometimes students pay to get these research materials which may not even suite for the intended purpose. Often information (on some websites) is incomplete, or does not follow standards and these may for the student to look elsewhere for researchable materials. This has led to inconsistencies in various data due to large volume of contrasting details leading to delay in completing projects before deadline.
CHAPTER FOUR
SYSTEM TESTING AND IMPLEMENTATION
INTRODUCTION
System testing and implementation is the last step in software development. It involves a process of putting into action, a formulated plan. Before implementation, plans must have been completed and objectives must be clear.
CHOICE OF PROGRAMMING LANGUAGE AND DATABASE
The programming language that will be used in this research work is PHP – a scripting language and MySQL database as the back end tool. The reason for using PHP is its simplicity; it enables server usage; it is modern; it is modular; and lastly, it is powerful and flexible.
MySQL is selected over other database tools because of it is a free, open-source database management system. For example, a MySQL database can be used to run a website or any other software.
CHAPTER FIVE
SUMMARY, CONCLUSION AND RECOMMENDATIONS
SUMMARY
A simple web search engine for indexing and searching web documents using PHP programming language. Because PHP is well known for its simple syntax and strong support for main operating systems, we hope it will be beneficial for learning information retrieval techniques, especially web search engine technology
CONCLUSION
A web search engine software has been presented that is suitable for researches and learning purposes because of its simplicity, portability, and modifiability. The strength of the program is in the search function component since many scores functions to sort relevant pages to user queries; especially, the inclusion of anchor text analysis makes our program can also find relevant pages that do not contain terms in the queries.
In the crawler component, only small modification was made. However, this small modification can improve the crawling reliability significantly. Readers who have read the documentation would notice that the crawling method is breadth first search without politeness policies (e.g., obeying robot.txt and controlling access to the servers), spam pages detection, priority URLs queue, and memory management to divide the load of crawling process between disk and RAM. Without good memory management, all of URLs seen tasks are conducted by searching in urllist table in the disk which is very time consuming. We will address the crawler design problem in the future researches.
RECOMMENDATIONS
In the world nothing is free from error. So it is very common that the research engine may contain error. The search engine is fully dependent on any search engine like Google. In future the following features will be integrated with the SRS:
- Searching based on search analyzer
- Mailing facility
- Chatting facility
REFERENCES
- Chakkrit, (2007). Design and Implementation of a High-Performance Distributed Web Crawler. Retrieved from https://www.researchgate.com/high_performance_crawler on July 24, 2018
- Naushad & Mumit, (2004). A Bangla Encoding for Better Spelling Suggestions. Retrieved from https://researchgate.net/ABanglaEncodingforBetterSpellingSuggestions on July 23, 2018
- Boldi, M. Santini, & S. Vigna: A Scalable Fully Distributed Web Crawler. Retrieved from https://www.researchgate.net/AScalableFullyDistributedWebCrawleron July 25, 2018
- Richard (2000). Design and Implementation of Research Search Engine. Retrieved from https://researchgate.net/DesignandImplementationofResearchSearchEngine on July 23, 2018
- Wikipedia, n.d. Search Engine Requirement. Retrieved from https://www.wikipedia.com/Search_Engine_Requirement on July 24, 2018