How we did it:
For any feedback, any questions, any notes or just for chat - feel free to follow us on social networks
Ian H. Witten, Alistair Moffat, Timothy C. Bell
In this fully updated second edition of the highly acclaimed Managing Gigabytes, authors Witten, Moffat, and Bell continue to provide unparalleled coverage of state-of-the-art techniques for compressing and indexing data. Whatever your field, if you work with large quantities of information, this book is essential reading--an authoritative theoretical resource and a practical guide to meeting the toughest storage and access challenges. It covers the latest developments in compression and indexing and their application on the Web and in digital libraries. It also details dozens of powerful techniques supported by mg, the authors' own system for compressing, storing, and retrieving text, images, and textual images. mg's source code is freely available on the Web. * Up-to-date coverage of new text compression algorithms such as block sorting, approximate arithmetic coding, and fat Huffman coding * New sections on content-based index compression and distributed querying, with 2 new data structures for fast indexing * New coverage of image coding, including descriptions of de facto standards in use on the Web (GIF and PNG), information on CALIC, the new proposed JPEG Lossless standard, and JBIG2 * New information on the Internet and WWW, digital libraries, web search engines, and agent-based retrieval * Accompanied by a public domain system called MG which is a fully worked-out operational example of the advanced techniques developed and explained in the book * New appendix on an existing digital library system that uses the MG software
Michael McCandless, Erik Hatcher, Otis Gospodnetić
Lucene remains an indispensable part of most enterprise applications. This search engine now powers Web options in diverse companies, including Netflix, LinkedIn, and the Mayo Clinic. This updated edition is the definitive guide to developing with Lucene.
Ricardo Baeza-Yates, Berthier Ribeiro-Neto
This is a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective. The advent of the Internet and the enormous increase in volume of electronically stored information generally has led to substantial work on IR from the computer science perspective - this book provides an up-to-date student oriented treatment of the subject.
William Bruce Frakes, Ricardo Baeza-Yates
Information retrieval is a sub-field of computer science that deals with the automated storage and retrieval of documents. Providing the latest information retrieval techniques, this guide discusses Information Retrieval data structures and algorithms, including implementations in C. Aimed at software engineers building systems with book processing components, it provides a descriptive and evaluative explanation of storage and retrieval systems, file structures, term and query operations, document operations and hardware. Contains techniques for handling inverted files, signature files, and file organizations for optical disks. Discusses such operations as lexical analysis and stoplists, stemming algorithms, thesaurus construction, and relevance feedback and other query modification techniques. Provides information on Boolean operations, hashing algorithms, ranking algorithms and clustering algorithms. In addition to being of interest to software engineering professionals, this book will be useful to information science and library science professionals who are interested in text retrieval technology.
W. Bruce Croft, Donald Metzler, Trevor Strohman
Search Engines: Information Retrieval in Practice is ideal for introductory information retrieval courses at the undergraduate and graduate level in computer science, information science and computer engineering departments. It is also a valuable tool for search engine and information retrieval professionals. Written by a leader in the field of information retrieval, Search Engines: Information Retrieval in Practice , is designed to give undergraduate students the understanding and tools they need to evaluate, compare and modify search engines. Coverage of the underlying IR and mathematical models reinforce key concepts. The book's numerous programming exercises make extensive use of Galago, a Java-based open source search engine.
Serge Linckels, Christoph Meinel
This book introduces a new approach to designing E-Librarian Services. With the help of this system, users will be able to retrieve multimedia resources from digital libraries more efficiently than they would by browsing through an index or by using a simple keyword search. E-Librarian Services combine recent advances in multimedia information retrieval with aspects of human-machine interfaces, such as the ability to ask questions in natural language; they simulate a human librarian by finding and delivering the most relevant documents that offer users potential answers to their queries. The premise is that more pertinent results can be retrieved if the search engine understands the meaning of the query; the returned results are therefore logical consequences of an inference rather than of keyword matches. Moreover, E-Librarian Services always provide users with a solution, even in situations where they are unable to offer a comprehensive answer.
Martin Scott White
Search must work -- How search works -- The search business -- Making a business case for search -- Specifying and selecting a search engine -- Optimizing search performance -- Search usability -- Desktop search -- Implementing web search -- Implementing search for an intranet -- Enterprise search -- Multilingual search -- Future directions?