NumPy and Pandas fundamentals for handling biological datasets
Prerequisites
Programming Fundamentals in Python
Basic Python syntax and data structures
Functions and control flow
File handling in Python
Experience with Python IDEs and Jupyter notebooks
Basic Biology Knowledge
Basic genomics terminology
Familiarity with common bioinformatics file formats (FASTA, FASTQ)
Who is the course for?
Bioinformaticians and genomics researchers who want to enhance their data analysis capabilities by mastering NumPy and Pandas for efficient processing of genomic datasets
About the course
Overall Course Objective
By the end of this course, students will be able to effectively utilize NumPy and Pandas libraries to manipulate, analyze, and process complex numerical and tabular data in Python, demonstrating proficiency in advanced array operations, data structures, and data manipulation techniques. Additionally, students will apply these skills to real-world bioinformatics problems, gaining practical experience in genomics data analysis and handling.
Specific Learning Objectives
After completing the NumPy section and hands-on exercises, students will be able to:
Explain the purpose and advantages of using NumPy in scientific computing and data analysis
Create, manipulate, and efficiently implement NumPy arrays through advanced techniques including indexing, sorting, splitting, vectorized operations, and broadcasting
After completing the Pandas section and hands-on exercises, students will be able to:
Understand the relationship between Pandas and NumPy, and effectively use Pandas Series and DataFrames for data analysis
Perform advanced data manipulation techniques including indexing, filtering, handling missing data, and combining DataFrames through merging and concatenation
Overall time schedule
Numpy for Bioinformatics |
3 Hours |
Pandas for Bioinformatics |
3 Hours |
Numpy for handling biological datasets
Pandas for handling biological datasets
- Lesson plan
- Introduction to Pandas
- Data Import and Export in Pandas
- DataFrame Manipulation & Sorting
- Indexing, Selection & Slicing in Pandas
- Summary Statistics & Aggregations in Pandas
- Hands-on: RNA Expression Analysis - alternative method
- Bonus Lesson 1: Handling Missing Data in Pandas
- Bonus Lesson 2: Merging DataFrames in Pandas
Datasets
Dependencies
All Python dependencies are listed in the
requirements.txt
file
Setup Python environment
Follow installation instructions in document linked here.