Skip to main content

CSC 591 615 Algorithms on Stings

3 Credit Hours

This course is a gentle introduction to string algorithms from a theoretical point of view. We will start with basic [exact] string matching algorithms like the Boyer-Moore algorithm, Knuth-Morris-Pratt algorithm, Aho-Corasick algorithm, etc., and move on to their advanced versions (string matching under edits/mismatches) and related lower bounds. We will then cover basic data structures (e.g., suffix trees and suffix arrays), their construction algorithms, and advanced data structures based on the Burrows-Wheeler Transform for compressed text indexing (e.g., FM-index) and applications. Finally, we will study some (NP) hard problems on strings, such as the shortest common superstring, center/median string problems, etc, and efficient approximation algorithms.

Prerequisite

CSC 505.

Course Objectives

By the end of the semester, students will learn about several fundamental algorithms and data structures for processing string data. They will also be able to design and analyze new algorithms. Students will also receive training on reading and understanding research papers in theoretical computer science. Finally, more enthusiastic students will receive an opportunity to work on some of the ongoing research projects with the lecturer.  

Course Requirements

There are no exams for this course; however, there will be homeworks, presentations and (surprise) quizzes. The focus will mostly be on reading and summarizing some of the research papers, solving some of the basic string algorithmic problems, designing data structures, etc. There will also be a couple of research projects, which can be done in small groups. A final presentation and report will be required.

Course Outline

Before each class, the upcoming topics and relevant study materials will be posted online. The students must review these materials before coming to the class. We may review these materials quickly in class and work on problems based on that. The tentative topics are as follows: we will start with some of the basic string algorithms, such as the Knuth-Morris-Pratt algorithm, Aho-Corasick algorithm, etc, and will see fundamental data structures like suffix arrays/trees, their construction algorithms and applications in solving many exciting problems. We will also see advanced algorithms for solving edit distance and related problems. This can be viewed as the first module. The second module focuses on compressed data structures such as compressed suffix arrays, FM-index, etc. We will see lower bounds and hardness results in the third module—for example, NP-completeness of shortest superstring problem, center and median string problem, etc. The final project will start roughly mid-semester (i.e., as soon as we finish covering the relevant materials). 

Created: 4/18/2024