Table of Contents
Project Rules §
Master in Informatics and Computing Engineering
Information Description, Storage and Retrieval
Instance: 2020/2021
—
¶
I. Introduction §
The course project runs for the whole semester and has three milestones, each with assigned deliverables. ¶
The project starts with students selecting a topic for the project and their team-mates. All projects have data as their focus and are structured in three main tasks — data representation, description, and retrieval. ¶
Groups have 4 students and have to be settled by the end of Week 2. ¶
Due to COVID-19 management rules, groups must include students from both shifts (i.e. even and odd student codes). ¶
Deliveries and oral presentations §
Each project delivery has a corresponding presentation and discussion. In the weeks assigned to project presentations the class meeting will take a workshop format, with project presentations and discussions according to the schedule. ¶
The grade for each project delivery considers the written report, the presentation and discussion, and the product if applicable. The grade for deliveries has an individual component: this is positive in case the students contributes to the workshop with questions or comments and negative if the student is not present or shows an unprofessional behaviour. ¶
Reports are written according to the format of short scientific papers (4 pages maximum in each delivery; a final report with up to 12 pages is expected), in two columns. The deadline for the submission is posted in the course plan. The paper is a self-contained, work-in-progress, based on the previous deliveries. Students can use the standard two-column template from ACM (Word and LaTeX templates available at ACM and also on Overleaf) or other two-column based template. ¶
Electronic submissions of the project deliverables are accepted up to 22:0018:00 on the day before the presentation day.
Moodle is used for submissions, which include: ¶
II. Course Project §
The project runs for the whole semester, starting with group assembly and ending with the final presentation. The project is reported and evaluated at three milestones, namely: Dataset preparation, Information Retrieval, and Semantic Web. The expected results depend to some extent on the selected project, but some details are provided below. ¶
Milestone #1: Dataset Preparation §
The first milestone is achieved with the preparation and characterisation of the datasets proposed for the project. The datasets are the foundation for the project and the goal of the first task is to prepare and explore them, using the recommendations in the project description. This task is heavily dependent on the datasets, which may require some extraction actions such as crawling or scraping. ¶
Work on this tasks depends on the nature, volume, organisation and accessibility of the proposed datasets. ¶
Each group must select datasets from two different types — (1) unstructured dataset rich in textual data, and (2) semantic dataset rich in structured and annotated data. ¶
The following list has a sample of the actions which may be required: ¶
- search repositories for datasets; ¶
- select convenient data subsets; ¶
- assess the authority of the data source and data quality; ¶
- perform exploratory data analysis; ¶
- characterise the datasets, identifying and describing some of their properties; ¶
- identify the conceptual model for the data domain; ¶
- identify interesting retrieval tasks in the data domain. ¶
Milestone #2: Information Retrieval §
The second milestone is achieved with the use of an information retrieval tool on the project datasets and their exploitation with free-text queries. ¶
This task makes use of state-of-the-art retrieval tools and involves the view of the datasets as collections of documents, the identification of a document model for indexing and the design of queries to be executed on the indexed information. ¶
The following list has a sample of the actions which may be required: ¶
- choose the information retrieval tool (Solr, Lucene, Terrier, Elasticsearch, …); ¶
- analyse the documents and identify their indexable components; ¶
- use the tool API to generate indexes; ¶
- use the tool API to configure the answer to queries; ¶
- demonstrate the indexing and retrieval processes; ¶
- evaluate the technologies used in this task. ¶
Milestone #3: Semantic Web §
The third milestone is achieved with the design, representation and exploitation of an ontology for the domain of the project datasets. An ontology design tool, such as Protégé, is recommended for the task. ¶
The following list has a sample of the actions which may be required: ¶
- analyse the data domain and its main concepts; ¶
- evaluate existing vocabularies or ontologies for the domain; ¶
- identify functionalities of the ontology tool; ¶
- build the ontology with the tool; ¶
- populate the ontology; ¶
- explore the ontology with appropriate queries; ¶
- evaluate qualitatively the adopted technologies for applications in the project domain. ¶
III. Student Project §
The course project may take the form of a personal project. If a student or a group of students is pursuing some personal project, or want to start one, the course may provide the context for this initiative. The evaluation of student projects has the same milestones as the regular course projects, but the deliverables are customised to the project. They are described generally as Proposal, Solution and Prototype. ¶
Step #1: Proposal §
This step involves the definition of the topic and goals of the project. ¶
The proposal includes a study of the state-of-the-art with the analysis of related work and the definition of a provisional project schedule. ¶
If the proposal is not accepted the students will take one of the course topics and continue according to the regular project milestones. ¶
Step #2: Solution §
In this step the project will proceed to the identification of user requirements and the definition of the architecture for the solution. The delivery includes the comparison with existing solutions for the same class of problems, the description of the work in progress and a preliminary evaluation of expected results. ¶