teach:dapi:202021:lectures:01
Table of Contents
L: 26/09/2020 §
Master in Informatics and Computing Engineering
Information Description, Storage and Retrieval
Instance: 2020/2021
—
¶
Lecture #1 :: 26/09/2020 §
Goals §
By the end of this class, the student should be able to: ¶
- Describe the content, evaluation and bibliography of the course; ¶
- Identify the key problems in the harvesting, organisation, processing and storage of large data collections. ¶
- Describe the scope of the projects to be done in the course. ¶
- List projects that are good examples of using and making data available. ¶
- List some data sources suitable to the practical work. ¶
- Select the right tools to collect and store the datasets ¶
- Characterize the datasets, identifying some of their properties ¶
- Select datasets suitable to the project theme ¶
Topics §
Bibliography §
- Project Rules, September 2020 ¶
- Figure 1.2: High level software architecture of an IR system, in Ricardo Baeza-Yates, Berthier Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology behind Search, Addison-Wesley Professional 2nd edition, 2011 ¶
- The Semantic Web stack, in Tim Berners-Lee, Semantic Web on XML, W3C, 2000 ¶
- Luís Torgo, Data Mining with R: Learning with Case Studies, Chapman & Hall/CRC, 2010, ISBN: 9781439810187 ¶
Materials §
- Conta-me Histórias Arquivo.pt (Prémio Arquivo.pt 2018) ¶
- Desarquivo (Prémio Arquivo.pt 2020) ¶
- Facebook, Facebook for developers ¶
- Twitter, Twitter Developer Documentation ¶
- Google Developers, Google Sheets API v4 ¶
- The Apache Software Foundation, Apache Tika - a content analysis toolkit ¶
- The R Foundation, The R Project for Statistical Computing ¶
Tasks §
Summary §
- Introduction to the course: goals, content, bibliography, assessment, practical work and plan. ¶
- Identification of the main problems in the search, organization, processing and storage of large datasets. ¶
- Scope of the practical work and project groups. ¶
- Exploration of data sources for the practical work. ¶
- Datasets. Data sources and formats. ¶
- Data collection and processing. Using OpenRefine, R, Python, MySQL, Excel. ¶
- Obtaining datasets from the domain chosen for practical work. ¶
- Exploratory analysis. ¶
— MCR, JCL, SSN ¶
teach/dapi/202021/lectures/01.txt · Last modified: 2020/10/01 22:45 by ssn