The lecture covers topics in the area of data-driven text analysis. Supervised and unsupervised machine learning methods, as well as issues of evaluation and assessment of quantitative results are discussed. The lecture takes a methodological look at various problems in language processing and discusses how they can be and are addressed. In most approaches, there are several levels of understanding, all of which will be addressed: What is the idea/intuition? How can it be formalized, something with the help of mathematical models? How can the formal model finally be implemented (efficiently)? Partly, the basics of formal models or programming concepts have to be discussed, which is also part of the lecture.

Please note that the class language is German, while material will (mostly) be in English. English questions during class are of course also okay.

Lecture and Exercise

Lecture (Thursdays) and tutorial (Tuesdays) are closely related in terms of content. Formally, they are two separate courses, namely "Computerlinguistik Übung" and "Sprachverarbeitung". If you do not want to/can not attend both courses, you are strongly advised to consult with the instructors.

Vorlesung (Donnerstags) und Übung (Dienstags) sind inhaltlich eng aufeinander bezogen. Formal handelt es sich um zwei getrennte Veranstaltungen, nämlich "Computerlinguistik Übung" und "Sprachverarbeitung". Wenn Sie nicht beide Veranstaltungen besuchen möchten/können, sollten Sie dringend mit den Dozenten Rücksprache halten. Bitte bringen Sie zur Übung einen Computer mit.

Module zur Computerlinguistik

Seit dem Wintersemester 2022/2023 haben wir ein neues Konzept für die computerlinguistische Ausbildung im Studiengang BA Informationsverarbeitung ausgearbeitet.

  • Modul Grundlagen der Computerlinguistik (alte Studienordnung "Computerlinguistische Grundlagen")
    • Seminar Computerlinguistische Grundlagen (immer im WiSe, Dozent Hermes, Inhalt: Linguistische Grundlagen, Annotation)
    • Vorlesung Sprachverarbeitung (immer im SoSe, Dozent Reiter, Quantitative Eigenschaften von Sprache, Machine Learning)
    • Übung Sprachverarbeitung (immer im SoSe, Dozent Pagel, begleitend zur Vorlesung, früher Seminar II)
    • Modulprüfung Klausur (immer im SoSe, 90 Minuten, Teilleistung im WiSe möglich, 30 Minuten)
  • Modul Anwendungen der Computerlinguistik (alte Studienordnung "Angewandte Linguistische Datenverarbeitung")
    • Übung Deep Learning (immer im WiSe, Dozentin Nester, Inhalt: Deep Learning Methoden)
    • Hauptseminar Experimentelles Arbeiten in der Sprachverarbeitung (immer im WiSe, Dozent Reiter, Inhalt: Experimente in der CL, wo kommen Fortschritt und Erkenntnis her?)
    • Modulprüfung Hausarbeit mit computerlinguistischem Experiment

Studienleistung und Modulprüfung / Study Achievements and Examination

There will be eight exercise sheets during the semester. If you solve and upload five of them via Ilias, the Studienleistung is considered fulfilled.

As examination, there will be a written exam in the final week of the course. Depending on your study programme and module, the exam will be 60 or 90 minutes (see slides April 16).

Material und Ressourcen / Material and Resources

The following literature is recommended background reading:

  • Dan Jurafsky/James H. Martin (2023). Speech and Language Processing. 3rd ed. Draft of Janaury 7, 2023. Prentice Hall. Available online here: https://web.stanford.edu/~jurafsky/slp3/
  • Christopher D. Manning/Hinrich Schütze (1999). Foundations of Statistical Natural Language Processing. Cambridge, Massachusetts and London, England: MIT Press. Selected chapters will be uploaded to Ilias.
  • Ian H. Witten/Eibe Frank (2005). Data Mining. 2nd ed. Practical Machine Learning Tools and Techniques. Elsevier. Selected chapters will be uploaded to Ilias.
  • Melanie Andresen (2024). Computerlinguistische Methoden für die Digital Humanities. Narr Studienbücher. Verlagswebseite.
  • Al Sweigart (2025). Automate the Boring Stuff with Python. 3rd edition. no starch press. Free to read online.

In addition to this page (which is the central hub), we will make use of the following platforms:

  • Ilias, to provide you with non-public materials and to upload your solutions for the exercises
  • A Jupyter Server for running Python code on https://jupyter.spinfo.uni-koeln.de/
  • Klips, to register for the module exam

Agenda

Week 1

  • Tuesday, 14. April: Introduction, Organisational matters, Python crash course part 1 (slides, notebook, exercise 1)
  • Thursday, 16. April: Introduction to Computational Linguistics (slides)

Week 2

Week 3

Week 4

  • Tuesday, May 5: Cancelled
  • Thursday, May 7: Evaluating Machine Learning Systems (slides)

Week 5

Week 6

  • Tuesday, May 19:
  • Thursday, May 21:

Pentecost (no lectures)

Week 7

  • Tuesday, June 2:
  • Thursday, June 4: Holiday

Week 8

  • Tuesday, 09. Juni:
  • Thursday, 11. Juni:

Week 9

  • Tuesday, 16. Juni:
  • Thursday, 18. Juni:

Week 10

  • Tuesday, 23. Juni:
  • Thursday, 25. Juni:

Week 11

  • Tuesday, 30. Juni:
  • Thursday, 2. Juli:

Week 12

  • Tuesday, 7. Juli:
  • Thursday, 9. Juli:

Week 13

  • Tuesday, 14. Juli:
  • Thursday, 16. Juli: Fragerunde vor der Klausur

Week 14

  • Tuesday, 21. Juli:
  • Thursday, 23. Juli: Klausur