ssn

field notes

User Tools

Site Tools


teach:dapi:202021:guide-mini1

Mini-teste #1 study guide §

Master in Informatics and Computing Engineering
Information Description, Storage and Retrieval
Instance: 2020/2021


 

The mini-test is planed to be answered in Moodle, but this is still uncertain. Nonetheless, this guide is useful for you to organize your study. 

This is a set of recommendations about the topics, available materials, and references for Mini-test #1.
The mini-test has an estimated duration of 90 minutes and some reference materials are available on the desktop machine. The mini-test is answered in Moodle and includes multiple-choice questions and short answer questions. There are some questions concerning the student projects, more specifically on Milestone #2. 

The subject of the mini-test is Information Retrieval.  

Topics §

Some topics for which there will be questions: 

  • Concepts: information need, search task, collection, query, results list; 
  • Search engines, indexing and retrieval; 
  • Building inverted indexes; 
  • Vector model: tf and idf calculations to compose the weight of a term in a document; 
  • Vector model: calculate the score of a document for a query; 
  • Evaluation: calculate recall and precision, draw P versus R curves (with average interpolated precision), calculate MAP; 
  • Web retrieval and link analysis: PageRank, hubs and authorities. 

Some detailed references §

In the following, BY refers to Modern Information Retrieval, by Baeza-Yates and Ribeiro-Neto; Manning refers to Introduction to Information Retrieval, by Manning et al.. 

IR tasks and systems §

Information retrieval vs data retrieval, modules in a IR system 

Questions: 

  • What is the difference between information retrieval and data retrieval? 
  • Give examples of IR and data retrieval systems. 
  • Give some examples of retrieval tasks evaluated in TREC. 
  • What are the modules of an IR system? 

Ref: BY, Chap. 1 (Intro)
Ref: TREC tracks
Ref: The Anatomy..., Brin & Page 

IR concepts §

Concepts: document, information need, relevance, bag of words, inverted index, postings list, term pre-processing. 

Questions: 

  • What is… a document, a collection, a term, a bag of words? 
  • Define stemming. 
  • What is… an inverted index, a vocabulary, a postings list? 
  • What is… an information need, a query, a results list? 
  • What is a relevant result in a results list? 

Ref: Manning, Chap. 1 (Boolean Retrieval) 

Vector model §

Term weighting, tf, df, cf, idf, vector model, ranking in the vector model 

Questions: 

  • What is the bag of words model for a document? 
  • What is… term frequency, collection frequency, document frequency, inverse document frequency? 
  • How do you calculate tf-idf weights? 
  • How do you rank documents in the vector model? 

Exercises: look at Exercises 6.8, 6.9, 6.10, 6.11, 6.15, 6.16, 6.17 and Examples 6.2, 6.3, 6.4 

Ref: Manning, Chap. 2 (The term vocabulary and postings lists) (2.2) and Chap. 6 (Scoring, term weighting and the vector space model) (6.2, 6.3) 

Evaluation §

Precision, recall, P-R curves, MAP, reference collections, relevance judgements 

Questions: 

  • What is… precision, recall, interpolated precision? 
  • What is… precision at k, R-precision? 
  • Name the components of a test collection. 
  • Why is a set of relevance judgements considered a “ground truth” for IR? 
  • Draw a precision-recall curve for capturing the evolution of precision in the ranked list of results for a query. 
  • What is an average 11-point precision-recall graph for a set of queries? 
  • What is MAP, and do you calculate it for a set of queries in a test collection? 

Exercises: look at Exercises 8.1, 8.4, 8.8, 8.9  

Ref: Manning, Chap. 8
Ref: TREC pages: http://trec.nist.gov/ 

Web information needs, the bowtie model, web search vs enterprise search, multimedia content, ranking functions and ranking signals 

Questions: 

  • What are informational, transactional and navigational information needs? 
  • Name some differences between web search and enterprise search. 
  • How do you index images? 
  • Give examples of ranking signals used by search engines. 
  • What are the SCC, IN and OUT components in the view of the web as a bowtie? 

Examples: look at Manning. 

Ref: Manning, Chap. 19
 

Web ranking, anchor text, PageRank, hubs and authorities 

Questions: 

  • What are in-links and out-links for a web page? 
  • How is anchor text used in web search? 
  • Calculate PageRank values for a set of linked documents. 
  • Calculate Hub and Authority values for a set of linked documents. 

Ref: Manning, Chap. 21 

MCR + JCL + SSN 

teach/dapi/202021/guide-mini1.txt · Last modified: 2020/11/06 19:23 by ssn

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki