====== L: 02/10/2020 ====== **Master in Informatics and Computing Engineering\\ Information Description, Storage and Retrieval\\ Instance: 2020/2021** \\ --- \\ ====== Lecture #2 :: 02/10/2020 ====== ===== Goals ===== By the end of this class, the student should be able to: * Characterize the project datasets, identifying some of their properties; * Obtain the conceptual model of the domain; * Identify suitable tools for the collection and storage of datasets; * Identify some retrieval tasks using the datasets. ===== Topics ===== - Project datasets, tasks and tools * Datasets * Formats * Conceptual models for the data * Dataset storage ===== Bibliography ===== * Scott Ambler, //The Object Primer, Chapter 8: Conceptual Domain Modeling//, Cambridge University Press, 3rd Edition, 2004 (Section 8.4) * OpenRefine, [[http://github.com/OpenRefine/OpenRefine/wiki/Getting-Started|OpenRefine User Guide]], last accessed September 2020 * [[..:project]], September 2020 ===== Materials ===== * {{ dapi2021-data.pdf |Data Collection and Preparation}} * OpenRefine, [[http://github.com/OpenRefine/OpenRefine/wiki/Getting-Started|OpenRefine Getting Started]], last accessed September 2020 * The Apache Software Foundation, [[http://tika.apache.org/|Apache Tika - a content analysis toolkit]], last accessed September 2020 * The R Foundation, [[https://www.r-project.org/|The R Project for Statistical Computing]], last accessed September 2020 ===== Tasks ===== * Characterize the datasets * Obtain the conceptual model of the domain * Try available tools to work with datasets * Discuss the storage of datasets * Identify retrieval tasks using the datasets ===== Summary ===== * Characterization of datasets of the project domain. * Conceptual domain modelling. * Datasets storage. * Identifications of retrieval tasks. --- //MCR, JCL, SSN// [[01|« Previous]] | [[index|Index]] | [[03|Next »]]