Ontologie-basierte Schlagwortextraktion zur Verbesserung des Korpus für die automatische Dokumentklassifizierung des Discovery Services LIVIVO von ZB MED Abschluss

Ullrich, Melanie

search hit 3 of 10

Ontologie-basierte Schlagwortextraktion zur Verbesserung des Korpus für die automatische Dokumentklassifizierung des Discovery Services LIVIVO von ZB MED Abschluss

Für die einfache Literaturrecherche von Fachinformationen bietet die ZB MED eine Literaturdatenbank namens LIVIVO an. Um eine thematische Suche zu ermöglichen, befasst sich diese Bachelorarbeit mit der Themenklassifikation der in der Datenbank vorhandenen Publikationen. Das Ziel der Arbeit besteht darin, den Korpus für eine automatisierte Klassifizierung aufzubereiten, um eine relevante Klasseneinteilung zu erzielen. Ausgehend von der Annahme, dass eine Textklassifizierung durch spezifische Terme und Schlüsselwörter gezieltere und aussagekräftigere Ergebnisse liefern kann, wird eine themenspezifische Aufbereitung mithilfe von Wissensorganisationssystemen (Thesauri) eingebunden. Hierzu wird im Vorhinein eine automatisierte Spracherkennung der Publikationen implementiert. Nach der Indexierung der Schlüsselwörter in den Dokumenten werden zwei statistische Klassifikationsmodelle für die Klassifizierung angewandt. Hierzu gehört die Latent Dirichlet Allocation, sowie der Stochastic Gradient Descent Classifier. Abschließend wird die automatische Schlagwortextraktion mit einer intellektuellen Themenanalyse verglichen und die Performance der Klassifizierung mit den aufbereiteten In Input-Daten auf eine Verbesserung hin analysiert.
For simple literaturesearch of specialised information, the institute ZB MED offers a literature database called LIVIVO. To create a thematic search, this bachelor thesis deals with the topic classification of the publications available in the database. The aim of the thesis is to prepare the corpus for an automated classification in order to obtain a relevant classification. Based on the assumption that text classification delivers more targeted and meaningful results with an input of specific terms and keywords, a topic-specific pre processing with the help of knowledge organisation systems (Thesauri) will be integrated. For this purpose, an automated language recognition of the publications is implemented in advance. After indexing of the keywords in the documents, two statistical classification models are applied for the classification. These are the Latent Dirichlet Allocation and the Stochastic Gradient Descent Classifier. Finally, the automatic keyword extraction is compared with an intellectual topic analysis and the performance of the classification with the processed input data is analysed for improvement.

Metadaten
Author:	Melanie Ullrich
Document Type:	Bachelor Thesis
Year of first Publication:	2023
Date of final exam:	2023/08/09
First Referee:	Konrad Förstner GND
Advisor:	Klaus Lippert
Degree Program:	Data and Information Science
Language:	German
Page Number:	40
Tag:	Schlagwortextraktion; Textklassifikation; Topic Modeling
GND Keyword:	Indexierung <Inhaltserschließung>; Thesaurus
URN:	urn:nbn:de:hbz:79pbc-opus-23911
Licence (German):	Creative Commons - Namensnennung-Weitergabe unter gleichen Bedingungen

Open Access

Ontologie-basierte Schlagwortextraktion zur Verbesserung des Korpus für die automatische Dokumentklassifizierung des Discovery Services LIVIVO von ZB MED Abschluss

Download full text files

Export metadata

Additional Services

Statistics