Institute for Computer Science I
Prof. Dr. Christian Sohler
                      

General
Home
Contact
People

Research
Projects

Teaching
Winter 08/09
Summer 08
Topics for Diploma Thesis

Links
Bonn Research Platform on Algorithmics
Institute of Computer Science
International MSc Program
Algorithmen und Berechnungskomplexität I (BA-INF 032) [oder Informatik III ]:

Zeit, Ort Mo, Mi 11-13, HS D (Prof. Dr. Christian Sohler)
Semesterwochenstunden 4V+2Ü
Übungen Übungen werden am 20.10.08 beginnen: (Prof. Dr. Christian Sohler, Morteza Monemizadeh)
  • Montag 13-15 (3x),    [ A6b, A6c, A7a ]
  • Montag 15-17 (2x),    [A6b, A6c]
  • Dienstag 9-11 ,    [A7b]
  • Dienstag 11-13,    [A6c]
  • Mittwoch 13-15 (2x),    [A6b, A6c]
  • Donnestag 11-13 ,    [A6c]
  • Donnestag 15-17 ,    [A6b]
Sprechstunde Montag 15-16 Uhr

Diese Fragen werden wir im ersten Teil von Algorithmen und Berechnungskomplexität I beantworten:

  • Was können Maschinen überhaupt berechnen?
  • Gibt es eine universelle Maschine?
  • Wie kann man Maschinen formal beschreiben?
  • Wie kann man ihre Mächtigkeit studieren?
  • Welche Probleme können effizient gelöst werden?
  • Wie misst man Effizienz?
  • Welche algorithmischen Methoden gibt es, Probleme zu lösen?

Diese Themen werden wir im ersten Teil von Algorithmen und Berechnungskomplexität I abdecken:

  1. Die Idee des Algorithmus
  2. Endliche Automaten und reguläre Ausdrücke
  3. Turingmaschine
  4. Universelle Turingmaschine
  5. Die Grenzen der Berechenbarkeit einschliesslisch der berechenbaren Funktionen
  6. effiziente Berechnungen (die O-Notation, effiziente Datenstrukturen, und effiziente Algorithmen)

Vorlesungen und Übungen:

  1. Montag, 13-10-08
    Vorlesung  ,   Übung : ps;pdf (Abgabetermin : Fri, 24.10.08)
  2. Mittwoch, 15-10-08
    Vorlesung
  3. Montag, 20-10-08
    Vorlesung
  4. Mittwoch, 22-10-08
    Vorlesung
  5. Montag, 27-10-08
    Vorlesung  ,   Übung : ps;pdf (Abgabetermin : Mit, 05.11.08)
  6. Mittwoch, 29-10-08
    Vorlesung
  7. Montag, 03-11-08
    Vorlesung
  8. Mittwoch, 05-11-08
    Vorlesung
  9. Montag, 10-11-08
    Vorlesung  ,   Übung : ps;pdf (Abgabetermin : Mit, 19.11.08)
  10. Mittwoch, 12-11-08 : Der Erste Test
  11. Montag, 17-11-08
    Vorlesung
  12. Mittwoch, 19-11-08
    Vorlesung
  13. Montag, 24-11-08
    Vorlesung  ,   Übung : ps;pdf (Abgabetermin : Mit, 03.12.08)
  14. Mittwoch, 26-11-08
    Vorlesung
  15. Montag, 1-12-08
    Vorlesung
  16. Montag, 8-12-08
    Vorlesung  ,   Übung : ps;pdf (Abgabetermin : Mit, 17.12.08)
  17. Mittwoch, 10-12-08
    Vorlesung
  18. Montag, 15-12-08
    Vorlesung
  19. Mittwoch, 17-12-08
    Vorlesung
  20. Montag, 05-01-09 : Der Zweite Test
    Test und die Lösungen: ps;pdf
    Übung : ps;pdf (Abgabetermin : Fri, 16.01.09, um 11:00 Uhr)
  21. Mittwoch, 07-01-09
    Vorlesung
  22. Montag, 12-01-09
    Vorlesung
  23. Mittwoch, 14-01-09
    Vorlesung
  24. Montag, 19-01-09
    Vorlesung
    Übung : ps;pdf (Abgabetermin : Mit, 28.01.09)
  25. Mittwoch, 21-01-09
    Vorlesung
  26. Montag, 26-01-09
    Vorlesung
  27. Mittwoch, 28-01-09
    Vorlesung
  28. Montag, 2-02-09
    Vorlesung
  29. Mittwoch, 4-02-09
    Vorlesung
  30. 16-02-09, um 10:30 Uhr : Die erste Klausur -> Hörsaal B und D
  31. 05-03-09, 16-17 Uhr : Die Klausureinsicht -> Seminarraum N1
  32. 20-03-09, um 10:00 Uhr : Die zweite Klausur -> Audimax
  33. 23-04-09, 10-11 Uhr : Klausureinsicht (2. Klausur) -> Hörsaal B

Zum weiterlesen:

  1. Einfürung in Formale Sprachen, Berechenbarkeit, Informations und Lerntheorie
    Nobert Blum.
  2. Algorithmen und Datenstrukturen
    Nobert Blum.
  3. Introduction to Automata and Language Theory
    J.E. Hopcroft, Rajeev Motwani, and J.D. Ullman
    Verlag: Addison-Wesley, 2000.
  4. Introduction to Algorithms
    Thomas H. Cormen, Stein Clifford, Charles E. Leiserson, Robert L. Rivest
  5. Elements of the Theory of Computation, 2nd edition
    Harry R. Lewis, Christos H. Papadimitriou
    Verlag: Prentice Hall PTR Upper Saddle River, NJ, USA
  6. Algorithm Design
    Jon Kleinberg and With Eva Tardos
    Verlag: Addison-Wesley, 2005
    Jon Kleinberg, Computer Science 482 (Spring 2004)
  7. Computational Complexity
    Christos H. Papadimitriou
    Verlag: Addison Wesley, 1994
  8. Algorithms
    Christos H. Papadimitriou, Sanjoy Dasgupta, and Umesh Vazirani
    Verlag: McGraw-Hill 2006
  9. Informatik III (Skript zur Vorlesung, WS 04/05)
    Dorothea Wagner
  10. Foundations of Computer Science
    Al Aho and Jeff Ullman
  11. Introduction to the Theory of Computation, Second Edition
    Michael Sipser
    Verlag: Course Technology
  12. 6.046/18.410 Introduction to Algorithms (Fall 2006),
    Professors: Erik D Demaine, Madhu Sudan
  13. ACM Programming Contest
Projektgruppe Clusteringalgorithmen: Theorie und Implementierung
ANNS via Clustering:

Approximate Nearest Neighbor Search (ANNS):Given n points P={p_1,...,p_n} in d-dimensional euclidean space, the distance measure l2 norm or cosine similarity measure and a query point p, in the approximate nearest neighbor problem we are asked to return a point q such that Dist(p,q)< c min_{r in P} Dist(p,r), where Dist(p,q) is the euclidean distance between p and q or one minus cosine similarity measure for text documents and c is a constant.

In "Finding Near Neighbors Through Cluster Pruning", F. Chierichetti, A. Panconesi, P. Raghavan, M. Soyio, A. Tiberi, and E. Upfal (PoDS 2007), the authors propose a new way of solving ANNS for text documents by first clustering the points in P and then searching inside one of the cluster to reduce the time complexity of this problem in high dimensional spaces where d is not a constant.

  • Preprocessing:

    Roughly speaking we first choose n^(1/2) sample points, call them representatives, and then we cluster the points in P with respect to their distances to the representatives and finally for each cluster i we compute the center of mass (or mean) of i. One can do this clustering algorithm hierarichally of a constant depth.
  • Query Processing:

    For a new query point p, first we find the mean i which is the nearest mean to p and then search the nearest point to p among points in the cluster i. Interrestingly, they showed theoretically this simple algorithm for the low-dimensional spaces and when d is a constant workes pretty good but its performance deteriorates quite rapidly when d>log n. For low-dimensional spaces they showed that with constant probability, each cluster contains n^(1/2) points so the query time will be bounded by n^(1/2) which is quite better than comparing p with all the points in P.
  • Real Data Modeling:

    Unfortunately most of databases nowedays are mapped into sparse and high dimensional spaces where d>log n and in this case their algorithm does not work well. In order to show that this anamoly is not adherence to high dimensional spaces they define a hierarichal model of mixture of gaussians (like a topic and subtopics of this topic) and analyze their algorithm in this framework. They show that if a distribution of a real data set obeys from such a framework then each cluster can have n^(1/2) points in expectation and this means the search time of this framework is n^(1/2). So for a data set all we need is to find means and covariance matrix of the distributions where the data set drawn from if the date set fits into this framework which is quite appealing.
  • Further Research:

    As a further research two questions come into mind first what happens if we extract dense features or coordinates for the clusters and do the ANNS for those features. Usually dense features can be found by means of linear combinations methods like Johnson-Lindenstrauss or coresets. Can we show with the help of these tools this simple algorithm works well for high dimensional spaces? The second question is what if a query is a line l and we are asked to return ANNS point to this line? One can imagine that a line query can be one or two query words at google and a returned nearest point can be a text document relevant to the query words.
  • Applications:

    1. Online and offline ranking web pages like what is done at Google, Yahoo,...
    2. Finding papers in one area of research at CiteSeer, Microsoft Libra,...
    3. Exploring and extracting hidden web communities
    4. studying long genes in Bioinformatics
    5. Detecting new types of cancers, in this type of applications we can precise and clearly determine new types of cancers which are not known or it takes a long time through long term clinical courses to detect them.