MSc in Bioinformatics
Extended Course Syllabus
Semester 
Course 
Coordinators 
Enrolment 
Fall 2016 
BIO101: Introduction to Mathematics 
Petrantonakis 
Life Sciences students 
Fall 2016 
BIO102: Introduction to Programming 
Pavlidis 
Life Sciences students 
Fall 2016 
COMP101: Principles of Cellular and Molecular Biology 
Kafetzopoulos 
Physical Sciences, Computer Sciences and Engineers 
Fall 2016 
COMP102: Introduction to Genetics and Evolutionary Biology 
Iliopoulos 
Physical Sciences, Computer Sciences and Engineers 
Spring 2017 
BC201: Advanced Statistics 
Tsagris 
All students 
Spring 2017 
BC202: Big Data Biomedical Databases 
Topalis 
All students 
Spring 2017 
BC203: Introduction to R for Bioinformatics 
Lagani/Pavlidis 
All students 
Spring 2017 
BC204: Methods in Bioinformatics 
Poirazi/Tsamardinos 
All students 
Spring 2017 
BC205: Algorithms in Bioinformatics 
Nikolaou/ Tsamardinos 
All students 
Course Name: Introduction to Mathematics
Course Code: BIO101
Semester: Fall
Coordinator: Panagiotis Petrantonakis
Instructors: Panagiotis Petrantonakis, Georgios Potamias
Summary
This course teaches two fundamental math topics: linear algebra and probability theory. The course is divided equally into two parts. The first part is for linear algebra, including systems of equations, vector spaces, determinants, eigenvalues, linear transformation and their applications. The second part covers topics in probability theory related to basic notions of probability, conditional probability and inference, random variables and respective distribution functions, expectations (mean and variance), covariance and correlation, the entropy of, and between random variables is presented and its application to Markov chains.
Target audience
The course addresses to students of medicine, biology, computer science, chemistry and mathematics, who wish to understand basic and advanced concepts of linear algebra and probability theory.
Basic Prerequisites
Basic knowledge of mathematics. Knowledge of programming, in R/Matlab would be beneficial.
Course Goals
· Understanding and solving linear algebra and related applications.
· Understanding and interpreting of fundamental concepts of probability theory.
Assessment
Midterm exam (written) (30%).
Final exam (written) (70%).
Suggested Reading
Linear Algebra:
Linear Algebra and its Applications, David Lay
Probability Theory:
(1) Christina Goldschmidt. Prelims Probability. https://www0.maths.ox.ac.uk/system/files/coursematerial/2015/2635/43/ProbNotes2015.pdf
https://www.dropbox.com/s/tn2sn2szw2dam2r/Prelims_Probability.pdf?dl=0
(2) Joseph C Watkins. An Introduction to the Science of Statistics: From Theory to Implementation.
http://math.arizona.edu/~hzhang/math574m/statbook.pdf
https://www.dropbox.com/s/p9fmav7po4pgik3/An_Introduction_to_the_Theory_of_Statistics.pdf?dl=0
(3) Mai Vu. Entropy and mutual information.
http://www.info612.ece.mcgill.ca/lecture_02.pdf
https://www.dropbox.com/s/3i3srcpatdh88cs/Entropy.pdf?dl=0
Tentative Program
Linear Algebra
Weeks 
Syllabus 
Chapters 
12 
Systems of Linear Equations 
1 
34 
Matrix Algebra 
2 
56 
Determinants 
3 
78 
Vector Spaces 
4 
910 
Eigen values and Eigenvectors 
5 
1112 
Orthogonality and Least Squares 
6 
13 
Revision. 

Probability Theory
Weeks 
Syllabus 
Chapters (1) 
Chapters (2) 
12 
Introduction: Motivating examples – data & probabilities; Sample space, events, outcomes, Venn diagrams, InclusionExclusion principle; Empirical and axiomatic definition of probability 
1.1 – 1.3 
4.1 , 4.2.1 , 4.2.2 4.3.2 [Principles of Inheritance]
5.1 – 5.6 
34 
Counting: Arrangements: permutations, combinations, binomial coefficients/pascal’s triangle; Axioms/Laws of probability; settheory and probabilities 

56 
Conditional probability. Definition, multiplication principle, law of total probability, Bayes formula, Independence of events 
1.4 – 1.7 
6.1 – 6.5 
78 
Recap and midterm exam. 


910 
Discrete random variables & Distribution functions (discrete): Definitions, properties, mass function, classical distributions (Bernoulli, Binomial, geometric, Poisson), mass function, classical distributions, Joint distributions, Conditional distribution. Expected values: mean, variance, independence, covariance/correlation 
pp. 16  17 2.1 – 2.4 
7.1 – 7.4 7.6  7.7/7.7.1 9.1 8.1 , 8.7 
1112 
Entropy. Definition, Joint, Conditional, Relative Entropy (KullbackLeibler distance), Mutual Information, Data processing inequality (application in Markov chains) 
Notes (3) 

13 
Revision 

Course Name: Introduction to Programming
Course Code: BIO102
Semester: Fall
Coordinator: Pavlos Pavlidis
Instructors: Pavlos Pavlidis, Anastasis Oulas, Evaggelos Pafilis, Alexandros Kanterakis
Part A: Python
1. Introduction: Motivating algorithmic examples in bioinformatics; data representation in computers; what is software; operating systems; programming languages (compiled, interpreted)
2. Data types: Variables, assignments; immutable variables, e.g. strings; different numerical types; operators (mathematical, in strings, e.g. concatenation); comments in programs; error messages and understanding them;
3. Conditions and boolean logic: logical operators; Ranges; control statements: ifelse, loops for and while;
4. Strings and text files, basic Input Output: manipulating files; reading/writing to text files; generating a formatted file
5. Data Structures I: Lists, tuples, basic list operators, replacing, inserting, removing elements; searching and sorting lists;
6. Data Structures II: dictionaries: adding and removing keys, accessing and replacing values; traversing dictionaries, specific cases that dictionaries are superior to lists and the vice versa
7. Functions and libraries: design with functions, hiding redundancy; arguments and return values; library packages, searching for packages, recursive functions
8. Dynamic programming I: theoretical background
9. Dynamical programming II: applications in bioinformatics
10. Drawing using python: simple 2d drawing: colors, shapes; the 'image' module
11. Object Oriented programming I: classes, objects, methods
12. Object Oriented programming II: inheritance, polymorphisms
13. Object Oriented programming III: multithreading
14. String algorithms I: regular expressions: searching for motifs
15. String algorithms II:hamming distance, edit distance, trie search
16. Specific algorithms in bioinformatics: sorting, searching, complexity analysis, smithwaterman algorithm, clustering
PartB: Introduction to Unix and Linux
1. Introduction to the UNIX Operating System
2. The Command line and the directory structure, file manipulation
3. Ubuntu and Debian Linux Distributions: Installation and virtual boxes
4. The vi editor, emacs editor
5. Unix Communications
6. Utilities and Filters
7. I/O redirection
8. Shells
9. Intro to shell Programming
10. Variables in shell Programming
11. Conditionals in shell Programming
12. Loops in shell Programming
13. Pipes
14. Stream Editing  sed
15. Grep, count, uniq, sort
16. gawk, data file merging, filtering and statistics calculation
17. program compiling and interpreters (python, perl, R)
18. system Administration Intro
19.Text processing : LaTeX  Comparing to nroff /troff
Course Name: Principles of Cellular and Molecular Biology
Course Code: COMP102
Semester: Fall
Coordinator: Dimitris Kafetzopoulos
Instructor(s): Dimitris Kafetzopoulos
Summary
The course covers all fundamental concepts, topics and techniques of Molecular and Cellular Biology from the chemical basis of life, to the central dogma of Molecular Biology, and the cellular organization of biological systems, presents the functional specialization of cell and tissues, and concludes with the molecular events underlying complex traits and diseases. Throughout this course the advances of analytical technologies and challenges of quantitative and computational approaches in Biomedical Sciences and research are highlighted.
Basic Prerequisites
None
Course Goals
The course is designed specifically for graduate students who didn’t have ANY biological education in their previous studies and in particular for Computer and Physical scientists, Mathematicians and Statisticians. It is a fast and intense means to cover the basic biological concepts and technologies and provide with the adequate background knowledge of Molecular and Cellular Biology to understand and address the cuttingedge challenges and questions of Biomedical and Life Science research in general.
Assessment
Exercise(s): 50
Final Exam: 50
Suggested Textbooks
QuickStart Molecular Biology: An Introductory Course for Mathematicians, Physicists, and Engineers, by Philip Benfey
Molecular Biology of the Cell, by Bruce Alberts & Alexander Johnson
Campbell Biology (International Edition), by Neil Campbell & Jane Reece.
Lehninger Principles of Biochemistry, by David Nelson & Michael Cox
Lewin's Genes XI, by Jocelyn Krebs et al.
Basic Biotechnology, by Colin Ratledge & Bjorn Kristiansen
Week 
Module 
1 
The chemical basis of life: Atoms, Molecules, Energy, Reactions, Interactions, Amino acids, Nucleotides, Lipids, Sugars, Proteins, Nucleic acids, Membranes, Polysaccharides and their Physicochemical parameters. 
2 
The storage and maintenance of genetic information: The central dogma of Molecular Biology, DNA structure, properties and function. DNA replication and repair. Genomics technologies. 
3 
The process of genetic information: Genetic code, Gene expression regulation, Transcription, RNA structure and function, Transcriptomics technologies. 
4 
The use of genetic information: Translation, Protein structures and functional diversity, Proteomics technologies. 
5 
Cellular organization of life. Membranes, Subcellular organelles, Molecular trafficking, Cell communication, Cell specialization. Imaging technologies. 
6 
The cell factory: Metabolism and Metabolomics technologies. 
7 
Regulation of genetic information: Cell Signaling, Gene expression regulation, Gene regulatory networks, Spatiotemporal control of gene expression. 
8 
Cell growth and division 
9 
Understanding complex traits. Selected examples of healthy and disease phenotypes. 
10 
Specialized tissues: Neuronal and Immune system 
11 
Experimental and computational approaches in Biology: Model animal systems and population approaches 
12 
Drug discovery processes, Biotechnological applications 
13 
Exercise: Paper presentation and/or Seatin Exam 
Course Name: Introduction to Genetics and Evolutionary Biology
Course Code: COMP102
Semester: Fall
Coordinator: Yiannis Iliopoulos
Instructors: Yiannis Iliopoulos, Pantelis Topalis, Tereza Vogiatzi
Summary
This course discusses the principles of genetics and evolution with application to the study of biological function at the level of molecules, cells, and multicellular organisms, including humans. The topics include: structure and function of genes, chromosomes and genomes, biological variation resulting from recombination, mutation, and selection, population genetics, use of genetic methods to analyze protein function, gene regulation and inherited disease. Students are required to attend a exercise/problem solving session (joint class with the sophomores of the department of Biology).
Basic Prerequisites
None
Course Goals
To provide basic knowledge in genetics and evolution.
Being able to understand the biological function at the level of molecules, cells and multicellular organisms.
Assessment
Exercises: 10%
Final Exam : 90%
Tentative Program
Week 
Module 
12 
Introduction: Mendelian analysis and its extensions Discussed Topics: Mendelian inheritance (one locus, two loci), dominance, sexlinked inheritance, epistasis, genealogical trees 
34 
Chromosomal theory of inheritance – Mutations. Molecular Genetic Tools Understanding the Genetic Basis of Cancer Discussed Topics: Linked genes, genetic mapping, point mutations, chromosomal rearrangements, chromosomal aberrations 
56 
Molecular basis of genetic diseases – Regulation of gene expression. DNA Fingerprinting Discussed Topics: Loss of function mutations, Gain of function mutations, Prokaryotic gene regulation, eukaryotic gene regulation 
78 
Human genome – Human cytogenetics Discussed Topics: Linkage analysis, WGS analysis, Exome analysis 
910 
Introduction to the theory of evolution – Origin of life Discussed Topics: Scientific theories 
1112 
Population genetics Discussed Topics: Genetic structure of natural populations, HardyWeinberg equilibrium, Genetic drift, Founder effect, Fitness 
13 
Molecular evolution and speciation 
Course name: Advanced Statistics
Course Code: BC201
Semester: Spring
Coordinators: Michail Tsagris and Ioannis Tsamardinos
Instructor: Michail Tsagris
Summary
The course covers basic areas of statistics, such as graphical representations of data, random variables, types of sampling and design, estimation via maximum likelihood. Hypothesis testing and confidence intervals (for means and proportions), type I and II errors and pvalues. Hypothesis testing via computational techniques (bootstrap and permutation). On a second phase, correlations for continuous variables (Pearson and Spearman coefficients), association of categorical variables (G^{2} test of independence) and linear regression. Finally, false discovery methods in multiple hypothesis testing will be mentioned. Demonstration and exercises using the R statistical package will be considered as well.
Target audience
The course addresses to students of medicine, biology, computer science, chemistry and mathematics, who have a minor knowledge of statistics and wish to understand in more depth certain statistical terms.
Prerequisites
A basic knowledge of mathematics and statistics. Knowledge of programming, in R would be beneficial.
Course learning objectives
Understanding of terms like sampling techniques, types of studies, hypothesis testing, type I and II error, confidence intervals, bootstrap, permutation, relationship between two variables.
Ability of interpreting some basic statistical results.
Acquiring the foundations for latter statistical analyses.
Ability to implement hypothesis testing in R.
Assessment
Exercises (10%).
Midterm exam (written) (20%).
Final exam (written) (70%).
Biostatistics with R, An Introduction to Statistics Through Biological Data
Babak Shahbaba, 2012, Springer.
Weekly program
Week 
Material 
Chapters 
12 
Introduction: Basic principles of probability and statistics, graphical representation of data, random variables, sampling techniques, types of studies. 
1, 2, 4 
34 
Estimation: Maximum likelihood estimation of parameters (mean, median, proportion) and confidence intervals (for mean and proportion). 
6 
56 
Hypothesis testing: one and two means with and without computational techniques, relationship with confidence intervals, explanation of concepts like type I and II errors and pvalues + demonstration with R. 
7, 11 
7 
Recap and midterm exam. 

810 
Associations: Relationship between pairs of continuous and discrete variables (Pearson and Spearman correlation coefficients, G^{2} test of independence), linear regression + demonstration with R. 
3, 8, 10 
11 
Extensive demonstration of the covered material in R. 

12 
False discovery rate methods: Bonferroni, Benjamini–Hochberg and StoreyTibshirani corrections, + demonstration with R. 

13 
Revision. 

Course Name: Big data biomedical databases
Course Code: BC202
Semester: Spring
Coordinator: Pantelis Topalis
Instructor: Pantelis Topalis
Summary
This course aims to describe some of the most popular biomedical databases covering a wide range of different datatypes. It will present the different ways the same information is often stored in different databases and the problems that are caused by those. Proper use of controlled vocabularies and ontologies can provide a solution and can promote data integration.
Basic Prerequisites
A preparatory course on molecular biology. Basic knowledge of a scripting language it would be useful.
Course Goals
Becoming familiarized with the various kinds of biological datatypes and formats..
Being able to extract relevant information for a project from different sources.
Learn how to massively access the data stored in databases via their API.
Assessment
Exercises: 50%
Final Exam : 50%
Tentative Program
Week 
Module 
12 
Introduction: Biological datatypes and their formats Discussed Topics: Biological sequences and their annotations (fasta,fastq,fast5,gtf,gff), Variation calls (vcf,gvf), Sequence alignments(sam,bam), Other formats (bed,wig,bedgraph,bigwig) 
34 
Data integration Discussed Topics: Metadata and how they can be organized. Controlled vocabularies and ontologies. Upper level ontologies, reference ontologies and application ontologies. Basic Formal Ontology. OBOFoundry. NCBO bioportal. Ontology development. 
56 
Non species specific databases Discussed Topics: NCBI and Entrez query system, ENA, Pfam and motif databases, SwissProt,UniProt, String, PDB 
78 
Species specific databases Discussed Topics: Human databases, MGI, Flybase, Yeast, VectorBase 
910 
Pathway databases Discussed Topics: KEGG, Reactome, Pathway Commons, Pathway Interaction Database, MetaCyc, Pantherdb 
1112 
Genome viewers Discussed Topics: Ensembl, NCBI, UCSC 
13 
Data Repositories 
Course Name: Introduction to R for Bioinformatics
Course Code: BC203
Semester: Fall
Coordinator: Vincenzo Lagani, Pavlos Pavlidis
Instructors: Vincenzo Lagani, Pavlos Pavlidis
Summary (~150250 words)
The course will introduce the R statistical software as a tool for performing data analysis tasks in the bioinformatics field. At the beginning, the basics of the R language will be explained, along with the main concepts related to the R software and its modular architecture. Most advanced concepts will then be introduced, as for example data structure in R, functional programming, graphical visualization and the creation of R packages. The second part of the course will focus on the Bioconductor initiative and its repository of R packages for bioinformatics. Particularly, functionalities for analyzing RNAseq and microarray data will be explored in detail.
Basic Prerequisites
Elementary knowledge of programming and statistics.
Course Goals
At the end of the course, the students are supposed to:
.know the capabilities of the R software and its possible uses;
.master the R language and being able to use it for writing scripts and simple data analysis pipelines
.know the scope and characteristics of the Bioconductor initiative
.be able to identify and use the most suitable Bioconductor packages for a given data analysis task.
Assessment
Assignments (30%)
Midterm (30%)
Final project with oral presentation (40%)
Suggested Reading
“Applied Statistics for Bioinformatics using R”, freely available at:
https://cran.rproject.org/doc/contrib/KrijnenIntroBioInfStatistics.pdf
Tentative Program
Week 
Module 
12 
The R statistical environment .The R software and its characteristics .The basics of the R language: syntax, control flow statements, data structures .Matrices manipulations .Functional programming 
34 
Advanced R part 1

56 
Advanced R part 2

7 
Examination week

810 
Bioconductor, part 1 .Bioconductor overview .Microarray data in R 
1112 
Bioconductor, part 2 .RNAseq data in R .Other applications 
13 
Examination week Recapitulation of the previous lessons and assignment of the final project 
Course Name: Methods in Bioinformatics
Course Code: BC204
Semester: Spring
Coordinators: Panayiota Poirazi, Ioannis Tsamardinos
Instructors: Panayiota Poirazi, Ioannis Tsamardinos
Summary
This course aims to describe some of the most prominent and most widely used methods for the analysis of biological data, with emphasis on different largescale data sets (e.g. microarray gene expression data, RNAseq data, metagenomics, biological networks etc). The course focuses on different methodologies for dimensionality reduction, feature selection, model selection, clustering, classification and network inference. The main goal of the course is not to describe in detail the most sophisticated implementations but to present the features and rational behind each method and its appropriatness for solving specific problems. Classes will include theoretical lectures as well as practical exercises, where students will be required to utilize existing software tools containing the presented methods to solve selected problems.
Basic Prerequisites
A preparatory course on molecular biology. Basic knowledge of math and statistics.
Course Goals
.Becoming familiarized with the most common problems in the analysis of different types of largescale biological data.
.Being able to categorize the most popular methods (dimensionality reduction, regression, clustering, classification, supervisedunsupervised, probabilistic, deterministic, sequential learning etc).
.Achieving a high level of competence in the use of several data analysis methods.
Assessment
Exercises (algorithm implementations and/or quiz question after each module): 50%
Final Exam : 50%
Suggested Reading
Introduction to Bioinformatics Algorithms. Pevzner and Jones
Tentative Program
Week 
Module 
12 
Introduction: Types of data and analysis problems Discussed Topics: microarray gene expression data, RNAseq data, proteinprotein interaction data, DNAprotein interaction data, massspectrometry data Biological Problem: Identifying differentially expressed genes in microarray data 
34 
Dimesionality Reduction: SVD, PCA Discussed Topics: The need and applications of DR methods in biological data Biological Problem: Identifying categories of disease in gene expression data 
56 
Clustering methods: Hierarchical clustering, kmeans Discussed Topics: Presentation of the algorithms and initialization of coditions. Discussions regarding different distance metrics that can be used, ways to optimize cluster size etc Biological Problem: Identifying categories of patients using expression data in control and cancer patient groups 
78 
Classification methods: Kernel methods, SVMs Discussed Topics: Description of methods and issues regarding feature extraction, model selection, avoiding overfitting Biological Problem: Find miRNA genes in the human genome using various existing classification tools 
910 
HMM: Hidden Markov Models 
1112 
Bayesian Networks 
13 
Revision 
Course Name: Algorithms in Bioinformatics
Course Code: BC205
Semester: Spring
Coordinators: Christoforos Nikolaou, Ioannis Tsamardinos
Instructor: Christoforos Nikolaou
Summary
This course aims to describe some of the most prominent and most widely used algorithms for the analysis of biological data, with emphasis on handling and analyzing biological sequences. The course focuses on a detailed description of algorithms for alignment, the rapid search of short sequences, tracing patterns and finding motifs in sequences, sequence assembly and phylogenetic analysis among others. The main goal of the course is not to describe in detail the most sophisticated implementations but to present the rationale behind the design of algorithms in a constructive and educational manner from both theoretical and practical viewpoints. Classes will include theoretical lectures as well as practical exercises, where students will be required to implement algorithms in the language of their choice.
Basic Prerequisites
A preparatory course on molecular biology. Basic knowledge of math and statistics.
Course Goals
.Becoming familiarized with the most common problems in the analysis of biological sequences.
.Being able to categorize the algorithms that find wide use in bioinformatics (dynamic programming, randomized algorithms, divide and conquer algorithms etc).
.Achieving a high level of competence in the performance of alignment and BLAST searches.
.Acquiring the ability to implement simple algorithms from the blackboard to the keyboard.
Assessment
Exercises (algorithm implementations and/or quiz question after each module): 50%. Final Exam : 50%
Suggested Reading
Introduction to Bioinformatics Algorithms. Pevzner and Jones
Bioinformatics Algorithms. A Practical Approach. Compeau and Pevzner
Tentative Program
Week 
Module 
12 
Introduction: Analysis of Sequence Composition Discussed Topics: kmer analysis, overrepresentations in sequences, sequence segmentation Biological Problem: Locating the origin of replication in a bacterial genome 
34 
Motif Discovery: Randomized Algorithms Discussed Topics: Motifs in biological sequences, sequence logos, Shannon Entropy calculations in motifs, de novo motif discovery Biological Problem: Locating transcription factor binding sites is DNA sequences 
56 
Sequence Alignment: Dynamic Programming Discussed Topics: Sequence comparisons, local and global alignment algorithms, scoring matrices Biological Problem: Pairwise alignment of two protein sequences with various scoring matrices 
78 
String Matching and Rapid Searches Discussed Topics: Rapid sequence comparisons, the BLAST algorithm, statistics of BLAST, rapid searches in complete genomes with BLAT Biological Problem: Annotating the human genome with BLAT 
910 
Data Structures and Transformation for NGS data analysis Discussed Topics: NGS data, big data biology, suffix trees, BurrowsWheeler Transformation. Biological Problem: Constructing and parsing a suffix tree for fast/accurate motif finding in a genomic sequence 
1112 
Phylogenetic Analysis Discussed Topics: Sequence comparison and clustering, distance methods, phylogenetic trees, tree combinatorics Biological Problem: Building an NJtree for a DNA sequence distance matrix 
13 
Anomaly Detection for NGS data analysis Discussed Topics: NGS data, working with files and coordinates Biological Problem: Implementing algorithms for Peak detection in NGS data 
List of Suggested Elective Courses^{*}
^{*}The list of Electives is currently being updated and remains under consideration. Elective courses will be finalized by the end of the Spring 2017 semester.
Course Name 
Existing/New 
Instructor(s) 
Semester ECTS 
Suggested by 

Computational Neuroscience 
New 
Yiota Poirazi, Athanasia Papoutsi 
TBD 

Yiota Poirazi 
Text Mining in Bioinformatics 
New 
Ioannis Iliopoulos, Nikolas Papanikolaou, Evangelos Pafilis 
TBD 

Ioannis Iliopoulos 





















































