 |
Institute for
Computer Science I Prof. Dr. Christian
Sohler |
General Home Contact People
Research Projects
Teaching Winter
08/09 Summer
08 Topics
for Diploma Thesis
Links Bonn Research Platform on
Algorithmics Institute
of Computer Science International MSc
Program |
Algorithmen und Berechnungskomplexität I (BA-INF 032) [oder Informatik III ]: |
|
| Zeit, Ort |
Mo, Mi 11-13, HS D (Prof. Dr. Christian Sohler) |
| Semesterwochenstunden |
4V+2Ü |
| Übungen |
Übungen werden am 20.10.08 beginnen: (Prof. Dr. Christian Sohler, Morteza Monemizadeh)
- Montag 13-15 (3x), [ A6b, A6c, A7a ]
- Montag 15-17 (2x), [A6b, A6c]
- Dienstag 9-11 , [A7b]
- Dienstag 11-13, [A6c]
- Mittwoch 13-15 (2x), [A6b, A6c]
- Donnestag 11-13 , [A6c]
- Donnestag 15-17 , [A6b]
|
| Sprechstunde |
Montag 15-16 Uhr
|
|
|
Diese Fragen werden wir im ersten Teil von Algorithmen und Berechnungskomplexität I beantworten:
- Was können Maschinen überhaupt berechnen?
- Gibt es eine universelle Maschine?
- Wie kann man Maschinen formal beschreiben?
- Wie kann man ihre Mächtigkeit studieren?
- Welche Probleme können effizient gelöst werden?
- Wie misst man Effizienz?
- Welche algorithmischen Methoden gibt es, Probleme zu lösen?
|
|
Diese Themen werden wir im ersten Teil von Algorithmen und Berechnungskomplexität I abdecken:
- Die Idee des Algorithmus
- Endliche Automaten und reguläre Ausdrücke
- Turingmaschine
- Universelle Turingmaschine
- Die Grenzen der Berechenbarkeit einschliesslisch der berechenbaren Funktionen
- effiziente Berechnungen (die O-Notation, effiziente Datenstrukturen, und effiziente Algorithmen)
|
|
Vorlesungen und Übungen:
- Montag, 13-10-08
Vorlesung , Übung : ps;pdf (Abgabetermin : Fri, 24.10.08)
- Mittwoch, 15-10-08
Vorlesung
- Montag, 20-10-08
Vorlesung
- Mittwoch, 22-10-08
Vorlesung
- Montag, 27-10-08
Vorlesung , Übung : ps;pdf (Abgabetermin : Mit, 05.11.08)
- Mittwoch, 29-10-08
Vorlesung
- Montag, 03-11-08
Vorlesung
- Mittwoch, 05-11-08
Vorlesung
- Montag, 10-11-08
Vorlesung , Übung : ps;pdf (Abgabetermin : Mit, 19.11.08)
- Mittwoch, 12-11-08 : Der Erste Test
- Montag, 17-11-08
Vorlesung
- Mittwoch, 19-11-08
Vorlesung
- Montag, 24-11-08
Vorlesung , Übung : ps;pdf (Abgabetermin : Mit, 03.12.08)
- Mittwoch, 26-11-08
Vorlesung
- Montag, 1-12-08
Vorlesung
- Montag, 8-12-08
Vorlesung , Übung : ps;pdf (Abgabetermin : Mit, 17.12.08)
- Mittwoch, 10-12-08
Vorlesung
- Montag, 15-12-08
Vorlesung
- Mittwoch, 17-12-08
Vorlesung
- Montag, 05-01-09 : Der Zweite Test
Test und die Lösungen: ps;pdf Übung : ps;pdf (Abgabetermin : Fri, 16.01.09, um 11:00 Uhr)
- Mittwoch, 07-01-09
Vorlesung
- Montag, 12-01-09
Vorlesung
- Mittwoch, 14-01-09
Vorlesung
- Montag, 19-01-09
Vorlesung Übung : ps;pdf (Abgabetermin : Mit, 28.01.09)
- Mittwoch, 21-01-09
Vorlesung
- Montag, 26-01-09
Vorlesung
- Mittwoch, 28-01-09
Vorlesung
- Montag, 2-02-09
Vorlesung
- Mittwoch, 4-02-09
Vorlesung
- 16-02-09, um 10:30 Uhr : Die erste Klausur -> Hörsaal B und D
- 05-03-09, 16-17 Uhr : Die Klausureinsicht -> Seminarraum N1
- 20-03-09, um 10:00 Uhr : Die zweite Klausur -> Audimax
- 23-04-09, 10-11 Uhr : Klausureinsicht (2. Klausur) -> Hörsaal B
|
|
Zum weiterlesen:
- Einfürung in Formale Sprachen, Berechenbarkeit, Informations und Lerntheorie
Nobert Blum.
- Algorithmen und Datenstrukturen
Nobert Blum.
- Introduction to Automata and Language Theory
J.E. Hopcroft, Rajeev Motwani, and J.D. Ullman
Verlag: Addison-Wesley, 2000.
- Introduction to Algorithms
Thomas H. Cormen, Stein Clifford, Charles E. Leiserson, Robert L. Rivest
- Elements of the Theory of Computation, 2nd edition
Harry R. Lewis,
Christos H. Papadimitriou
Verlag: Prentice Hall PTR Upper Saddle River, NJ, USA
- Algorithm Design
Jon Kleinberg and With Eva Tardos
Verlag: Addison-Wesley, 2005
Jon Kleinberg, Computer Science 482 (Spring 2004)
- Computational Complexity
Christos H. Papadimitriou
Verlag: Addison Wesley, 1994
- Algorithms
Christos H. Papadimitriou, Sanjoy Dasgupta, and Umesh Vazirani
Verlag: McGraw-Hill 2006
- Informatik III (Skript zur Vorlesung, WS 04/05)
Dorothea Wagner
- Foundations of Computer Science
Al Aho and Jeff Ullman
- Introduction to the Theory of Computation, Second Edition
Michael Sipser Verlag: Course Technology
- 6.046/18.410 Introduction to Algorithms (Fall 2006),
Professors: Erik D Demaine, Madhu Sudan
- ACM Programming Contest
|
| Projektgruppe Clusteringalgorithmen: Theorie und Implementierung |
|
ANNS via Clustering:
Approximate Nearest Neighbor Search (ANNS):Given n points P={p_1,...,p_n} in d-dimensional euclidean space, the distance measure l2 norm or cosine similarity measure and a query point p, in the approximate nearest neighbor problem we are asked to return a point q such that Dist(p,q)< c min_{r in P} Dist(p,r), where Dist(p,q) is the euclidean distance between p and q or one minus cosine similarity measure for text documents and c is a constant.
In "Finding Near Neighbors Through Cluster Pruning", F. Chierichetti, A. Panconesi, P. Raghavan, M. Soyio, A. Tiberi, and E. Upfal (PoDS 2007), the authors propose a new way of solving ANNS for text documents by first clustering the points in P and then searching inside one of the cluster to reduce the time complexity of this problem in high dimensional spaces where d is not a constant.
Preprocessing: Roughly speaking we first choose n^(1/2) sample points, call them representatives, and then we cluster the points in P with respect to their distances to the representatives and finally for each cluster i we compute the center of mass (or mean) of i. One can do this clustering algorithm hierarichally of a constant depth.
Query Processing: For a new query point p, first we find the mean i which is the nearest mean to p and then search the nearest point to p among points in the cluster i. Interrestingly, they showed theoretically this simple algorithm for the low-dimensional spaces and when d is a constant workes pretty good but its performance deteriorates quite rapidly when d>log n. For low-dimensional spaces they showed that with constant probability, each cluster contains n^(1/2) points so the query time will be bounded by n^(1/2) which is quite better than comparing p with all the points in P.
Real Data Modeling: Unfortunately most of databases nowedays are mapped into sparse and high dimensional spaces where d>log n and in this case their algorithm does not work well. In order to show that this anamoly is not adherence to high dimensional spaces they define a hierarichal model of mixture of gaussians (like a topic and subtopics of this topic) and analyze their algorithm in this framework. They show that if a distribution of a real data set obeys from such a framework then each cluster can have n^(1/2) points in expectation and this means the search time of this framework is n^(1/2). So for a data set all we need is to find means and covariance matrix of the distributions where the data set drawn from if the date set fits into this framework which is quite appealing.
Further Research: As a further research two questions come into mind first what happens if we extract dense features or coordinates for the clusters and do the ANNS for those features. Usually dense features can be found by means of linear combinations methods like Johnson-Lindenstrauss or coresets. Can we show with the help of these tools this simple algorithm works well for high dimensional spaces? The second question is what if a query is a line l and we are asked to return ANNS point to this line? One can imagine that a line query can be one or two query words at google and a returned nearest point can be a text document relevant to the query words.
Applications:
- Online and offline ranking web pages like what is done at Google, Yahoo,...
- Finding papers in one area of research at CiteSeer, Microsoft Libra,...
- Exploring and extracting hidden web communities
- studying long genes in Bioinformatics
- Detecting new types of cancers, in this type of applications we can precise and clearly determine new types of cancers which are not known or it takes a long time through long term clinical courses to detect them.
|
|