Mini-teste #1 study guide §

Master in Informatics and Computing Engineering
Information Description, Storage and Retrieval
Instance: 2020/2021
—
¶

The mini-test is planed to be answered in Moodle, but this is still uncertain. Nonetheless, this guide is useful for you to organize your study. ¶

This is a set of recommendations about the topics, available materials, and references for Mini-test #1.
The mini-test has an estimated duration of 90 minutes and some reference materials are available on the desktop machine. The mini-test is answered in Moodle and includes multiple-choice questions and short answer questions. There are some questions concerning the student projects, more specifically on Milestone #2. ¶

The subject of the mini-test is Information Retrieval. ¶

Topics §

Some topics for which there will be questions: ¶

Concepts: information need, search task, collection, query, results list; ¶
Search engines, indexing and retrieval; ¶
Building inverted indexes; ¶
Vector model: tf and idf calculations to compose the weight of a term in a document; ¶
Vector model: calculate the score of a document for a query; ¶
Evaluation: calculate recall and precision, draw P versus R curves (with average interpolated precision), calculate MAP; ¶
Web retrieval and link analysis: PageRank, hubs and authorities. ¶

Some detailed references §

In the following, BY refers to Modern Information Retrieval, by Baeza-Yates and Ribeiro-Neto; Manning refers to Introduction to Information Retrieval, by Manning et al.. ¶

IR tasks and systems §

Information retrieval vs data retrieval, modules in a IR system ¶

Questions: ¶

What is the difference between information retrieval and data retrieval? ¶
Give examples of IR and data retrieval systems. ¶
Give some examples of retrieval tasks evaluated in TREC. ¶
What are the modules of an IR system? ¶

Ref: BY, Chap. 1 (Intro)
Ref: TREC tracks
Ref: The Anatomy..., Brin & Page ¶

IR concepts §

Concepts: document, information need, relevance, bag of words, inverted index, postings list, term pre-processing. ¶

Questions: ¶

What is… a document, a collection, a term, a bag of words? ¶
Define stemming. ¶
What is… an inverted index, a vocabulary, a postings list? ¶
What is… an information need, a query, a results list? ¶
What is a relevant result in a results list? ¶

Ref: Manning, Chap. 1 (Boolean Retrieval) ¶

Vector model §

Term weighting, tf, df, cf, idf, vector model, ranking in the vector model ¶

Questions: ¶

What is the bag of words model for a document? ¶
What is… term frequency, collection frequency, document frequency, inverse document frequency? ¶
How do you calculate tf-idf weights? ¶
How do you rank documents in the vector model? ¶

Exercises: look at Exercises 6.8, 6.9, 6.10, 6.11, 6.15, 6.16, 6.17 and Examples 6.2, 6.3, 6.4 ¶

Ref: Manning, Chap. 2 (The term vocabulary and postings lists) (2.2) and Chap. 6 (Scoring, term weighting and the vector space model) (6.2, 6.3) ¶

Evaluation §

Precision, recall, P-R curves, MAP, reference collections, relevance judgements ¶

Questions: ¶

What is… precision, recall, interpolated precision? ¶
What is… precision at k, R-precision? ¶
Name the components of a test collection. ¶
Why is a set of relevance judgements considered a “ground truth” for IR? ¶
Draw a precision-recall curve for capturing the evolution of precision in the ranked list of results for a query. ¶
What is an average 11-point precision-recall graph for a set of queries? ¶
What is MAP, and do you calculate it for a set of queries in a test collection? ¶

Exercises: look at Exercises 8.1, 8.4, 8.8, 8.9 ¶

Ref: Manning, Chap. 8
Ref: TREC pages: http://trec.nist.gov/ ¶

Web search §

Web information needs, the bowtie model, web search vs enterprise search, multimedia content, ranking functions and ranking signals ¶

Questions: ¶

What are informational, transactional and navigational information needs? ¶
Name some differences between web search and enterprise search. ¶
How do you index images? ¶
Give examples of ranking signals used by search engines. ¶
What are the SCC, IN and OUT components in the view of the web as a bowtie? ¶

Examples: look at Manning. ¶

Ref: Manning, Chap. 19
¶

Link analysis §

Web ranking, anchor text, PageRank, hubs and authorities ¶

Questions: ¶

What are in-links and out-links for a web page? ¶
How is anchor text used in web search? ¶
Calculate PageRank values for a set of linked documents. ¶
Calculate Hub and Authority values for a set of linked documents. ¶

Ref: Manning, Chap. 21 ¶

– MCR + JCL + SSN ¶

ssn

Table of Contents

Mini-teste #1 study guide §

Topics §

Some detailed references §

IR tasks and systems §

IR concepts §

Vector model §

Evaluation §

Web search §

Link analysis §