NumPy and Pandas fundamentals for handling biological datasets

Prerequisites

  • Programming Fundamentals in Python

    • Basic Python syntax and data structures

    • Functions and control flow

    • File handling in Python

    • Experience with Python IDEs and Jupyter notebooks

  • Basic Biology Knowledge

    • Basic genomics terminology

    • Familiarity with common bioinformatics file formats (FASTA, FASTQ)

Who is the course for?

Bioinformaticians and genomics researchers who want to enhance their data analysis capabilities by mastering NumPy and Pandas for efficient processing of genomic datasets

About the course

Overall Course Objective

By the end of this course, students will be able to effectively utilize NumPy and Pandas libraries to manipulate, analyze, and process complex numerical and tabular data in Python, demonstrating proficiency in advanced array operations, data structures, and data manipulation techniques. Additionally, students will apply these skills to real-world bioinformatics problems, gaining practical experience in genomics data analysis and handling.

Specific Learning Objectives

  1. After completing the NumPy section and hands-on exercises, students will be able to:

    • Explain the purpose and advantages of using NumPy in scientific computing and data analysis

    • Create, manipulate, and efficiently implement NumPy arrays through advanced techniques including indexing, sorting, splitting, vectorized operations, and broadcasting

  2. After completing the Pandas section and hands-on exercises, students will be able to:

    • Understand the relationship between Pandas and NumPy, and effectively use Pandas Series and DataFrames for data analysis

    • Perform advanced data manipulation techniques including indexing, filtering, handling missing data, and combining DataFrames through merging and concatenation

Overall time schedule

Numpy for Bioinformatics

3 Hours

Pandas for Bioinformatics

3 Hours

Datasets

Dependencies

Setup Python environment

Follow installation instructions in document linked here.

Credits