Hands-on: RNA Expression Analysis - alternative method
Objectives
Examine differential expression of immune-related genes between patient groups previously classified as immunologically strong (‘istrong’) and immunologically weak (‘iweak’)
Apply an alternative analytical approach using Z-ratio methodology to complement standard differential expression tools like DESeq
Ranks immune-related genes based on their relative expression differences between the patient groups
RNA Expression Analysis Steps:
Data Loading and visualization
Load sample group information (iweak vs istrong)
Load gene expression count matrix
View first few rows/columns
View basic info
Sample Identification
Filter samples by group (iweak/istrong)
Match count matrix columns with sample IDs
Data Preprocessing
Convert count matrix to numeric values
Apply log2 transformation: log2(counts + 1)
Statistical Analysis
Calculate mean and std for each gene within each group
Compute Z-scores within each sample group
Calculate Z-score differences between groups
Compute standard deviation of all differences
Ranking Genes
Calculate Z-ratio: difference / std_difference
Rank genes by Z-ratio (highest to lowest)
This workflow standardizes the comparison between sample groups by accounting for the overall variability in gene expression across the entire experiment.
import pandas as pd
import numpy as np
1. Data Loading and visualization
Load sample group information (iweak vs istrong)
Load gene expression count matrix
View first few rows/columns
View basic info
Load sample group information (iweak vs istrong)
sample_info = pd.read_csv(
"test_data/Sample_group_info.csv", header=None, names=["Sample", "Group"]
)
print("Samples and Groups:\n", sample_info.head())
print("Dataframe info:\n", sample_info.info())
print("\nNumber of samples in each group:")
print(sample_info.groupby(by="Group").size())
Samples and Groups:
Sample Group
0 SH_TS_BC111 iweak
1 SH_TS_BC112 iweak
2 SH_TS_BC113 iweak
3 SH_TS_BC119 istrong
4 SH_TS_BC133 iweak
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Sample 303 non-null object
1 Group 303 non-null object
dtypes: object(2)
memory usage: 4.9+ KB
Dataframe info:
None
Number of samples in each group:
Group
istrong 154
iweak 149
dtype: int64
Load gene expression count matrix
count_matrix = pd.read_csv(
"test_data/count_matrix_with_row_indices.csv", header=0, index_col=0, sep=";"
)
print("Count matrix:\n", count_matrix.iloc[:, :5].head())
print("Dataframe info:\n", count_matrix.info())
print(
"Descriptive statistics (First 5 samples):\n", count_matrix.iloc[:, :5].describe()
)
Count matrix:
SH_TS_BC_C1 SH_TS_BC_C11 SH_TS_BC_C15 SH_TS_BC_C3 SH_TS_BC01
Gene
ACTR3B 25 559 231 44 23
ANLN 173 2475 886 320 6
APOBEC3G 114 8806 2781 537 47
AURKA 626 7492 2829 564 14
BAG1 317 5949 2357 275 26
<class 'pandas.core.frame.DataFrame'>
Index: 80 entries, ACTR3B to VEGFA
Columns: 483 entries, SH_TS_BC_C1 to UNC_TGS_BC_Y90_R1
dtypes: int64(483)
memory usage: 302.5+ KB
Dataframe info:
None
Descriptive statistics (First 5 samples):
SH_TS_BC_C1 SH_TS_BC_C11 SH_TS_BC_C15 SH_TS_BC_C3 SH_TS_BC01
count 80.000000 80.00000 80.000000 80.000000 80.000000
mean 1118.700000 20114.17500 6846.137500 1403.150000 126.212500
std 2627.440095 42620.73209 13895.968032 2411.549117 329.326881
min 1.000000 13.00000 6.000000 0.000000 0.000000
25% 58.500000 1758.50000 692.500000 207.000000 3.750000
50% 265.000000 5481.00000 1903.500000 529.000000 22.000000
75% 849.500000 15620.50000 5396.750000 1142.000000 100.250000
max 15912.000000 239031.00000 79955.000000 12397.000000 2352.000000
print("Number of NaN values in each column:", count_matrix.isna().sum(0))
print("Number of NaN values in the dataframe:", count_matrix.isna().sum(0).sum())
Number of NaN values in each column: SH_TS_BC_C1 0
SH_TS_BC_C11 0
SH_TS_BC_C15 0
SH_TS_BC_C3 0
SH_TS_BC01 0
..
UNC_TGS_BC_9m 0
UNC_TGS_BC_Y23 0
UNC_TGS_BC_Y23_R1 0
UNC_TGS_BC_Y90 0
UNC_TGS_BC_Y90_R1 0
Length: 483, dtype: int64
Number of NaN values in the dataframe: 0
2. Sample Identification
Filter samples by group (iweak/istrong)
Match count matrix columns with sample IDs
Filter samples and match count matrix - iweak
# Display info about iweak samples
iweak_samples = sample_info[sample_info["Group"] == "iweak"]
print("iweak samples:")
print(iweak_samples.head())
print("Number of iweak samples:", len(iweak_samples))
# Display info about iweak samples
print("iweak samples:")
print(iweak_samples.info())
iweak samples:
Sample Group
0 SH_TS_BC111 iweak
1 SH_TS_BC112 iweak
2 SH_TS_BC113 iweak
4 SH_TS_BC133 iweak
5 SH_TS_BC134 iweak
Number of iweak samples: 149
iweak samples:
<class 'pandas.core.frame.DataFrame'>
Index: 149 entries, 0 to 302
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Sample 149 non-null object
1 Group 149 non-null object
dtypes: object(2)
memory usage: 3.5+ KB
None
# Identify columns that match iweak sample IDs
print("Samples in count matrix (first 10):\n", count_matrix.columns[:10])
print("Data Type of count_matrix.columns:", type(count_matrix.columns))
## `pandas.core.indexes.base.Index` is not a NumPy ndarray, but it is built on top of NumPy arrays.
## In other words, while a Pandas Index can store data in a way that is compatible with NumPy
## `pandas.core.indexes.base.Index` it is a separate object that provides additional functionality specific to indexing and more complex operations suited for Pandas
Samples in count matrix (first 10):
Index(['SH_TS_BC_C1', 'SH_TS_BC_C11', 'SH_TS_BC_C15', 'SH_TS_BC_C3',
'SH_TS_BC01', 'SH_TS_BC010_1', 'SH_TS_BC010_2', 'SH_TS_BC02',
'SH_TS_BC04', 'SH_TS_BC05'],
dtype='object')
Data Type of count_matrix.columns: <class 'pandas.core.indexes.base.Index'>
iweak_cols = count_matrix.columns.isin(iweak_samples["Sample"])
print("iweak column mask (first 10):")
print(iweak_cols[:10])
print("Number of iweak columns in iweak column mask:", iweak_cols.sum())
# print("Number of iweak columns in count matrix:", len(iweak_cols[iweak_cols]))
print("\niweak column mask (first 30):", iweak_cols[:30])
print(
f"First 30 columns of iweak: {count_matrix.columns[iweak_cols][:30]} \
\n Total number of iweak columns: {len(count_matrix.columns[iweak_cols])}"
)
iweak column mask (first 10):
[False False False False False False False False False False]
Number of iweak columns in iweak column mask: 54
iweak column mask (first 30): [False False False False False False False False False False False False
False False False False False False False False False False False False
False True True True False False]
First 30 columns of iweak: Index(['SH_TS_BC111', 'SH_TS_BC112', 'SH_TS_BC113', 'SH_TS_BC133',
'SH_TS_BC134', 'SH_TS_BC139', 'SH_TS_BC141', 'SH_TS_BC146',
'SH_TS_BC147', 'SH_TS_BC152', 'SH_TS_BC154', 'SH_TS_BC155',
'SH_TS_BC160', 'SH_TS_BC161', 'SH_TS_BC163', 'SH_TS_BC169',
'SH_TS_BC172', 'SH_TS_BC173', 'SH_TS_BC176', 'SH_TS_BC181',
'SH_TS_BC183', 'SH_TS_BC184', 'SH_TS_BC185', 'SH_TS_BC196',
'SH_TS_BC198', 'SH_TS_BC200', 'SH_TS_BC203', 'SH_TS_BC207',
'SH_TS_BC210', 'SH_TS_BC212'],
dtype='object')
Total number of iweak columns: 54
Filter samples and match count matrix - istrong
# Display info about istrong samples
istrong_samples = sample_info[sample_info["Group"] == "istrong"]
print("\nistrong samples:")
print(istrong_samples.head())
print("Number of iweak samples:", len(istrong_samples))
# Display info about iweak samples
print("iweak samples:")
print(istrong_samples.info())
istrong samples:
Sample Group
3 SH_TS_BC119 istrong
10 SH_TS_BC150 istrong
11 SH_TS_BC151 istrong
13 SH_TS_BC153 istrong
19 SH_TS_BC165 istrong
Number of iweak samples: 154
iweak samples:
<class 'pandas.core.frame.DataFrame'>
Index: 154 entries, 3 to 301
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Sample 154 non-null object
1 Group 154 non-null object
dtypes: object(2)
memory usage: 3.6+ KB
None
istrong_cols = count_matrix.columns.isin(istrong_samples["Sample"])
print("istrong column mask (first 10):")
print(istrong_cols[:10])
print("Number of istrong columns in istrong column mask:", istrong_cols.sum())
print("\nistrong column mask (first 30):", istrong_cols[:30])
print(
f"First 30 columns of istrong: {count_matrix.columns[istrong_cols][:30]} \
\n Total number of istrong columns: {len(count_matrix.columns[istrong_cols])}"
)
istrong column mask (first 10):
[False False False False False False False False False False]
Number of istrong columns in istrong column mask: 37
istrong column mask (first 30): [False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False]
First 30 columns of istrong: Index(['SH_TS_BC119', 'SH_TS_BC150', 'SH_TS_BC151', 'SH_TS_BC153',
'SH_TS_BC165', 'SH_TS_BC166', 'SH_TS_BC170', 'SH_TS_BC171',
'SH_TS_BC175', 'SH_TS_BC177', 'SH_TS_BC178', 'SH_TS_BC180',
'SH_TS_BC182', 'SH_TS_BC188', 'SH_TS_BC193', 'SH_TS_BC199',
'SH_TS_BC202', 'SH_TS_BC204', 'SH_TS_BC209', 'SH_TS_BC211',
'SH_TS_BC219', 'SH_TS_BC233', 'SH_TS_BC235', 'SH_TS_BC240',
'SH_TS_BC249', 'SH_TS_BC252', 'SH_TS_BC255', 'SH_TS_BC265',
'SH_TS_BC266', 'SH_TS_BC272'],
dtype='object')
Total number of istrong columns: 37
3. Data Preprocessing
Convert count matrix to numeric values
Apply log2 transformation: log2(counts + 1)
Convert cm to log scale:
Gene expression count data often contains zeros (genes that weren’t detected)
Since log₂(0) is mathematically undefined (negative infinity), we add 1 to every value
This is called a “pseudo-count” approach, creating what’s known as “log₂(counts+1)”
Differences in log space correspond to fold changes in original space
A difference of 1 in log₂ space = a 2-fold change in original counts
A difference of 2 in log₂ space = a 4-fold change in original counts
Convert count matrix to numeric values
# Convert count matrix to numeric values
count_matrix = count_matrix.astype(float, errors="raise")
count_matrix.info()
<class 'pandas.core.frame.DataFrame'>
Index: 80 entries, ACTR3B to VEGFA
Columns: 483 entries, SH_TS_BC_C1 to UNC_TGS_BC_Y90_R1
dtypes: float64(483)
memory usage: 302.5+ KB
print("Counts of zeros in each column:")
print(count_matrix.apply(lambda x: x == 0, axis=0).sum(axis=0))
print(
"Total zeros in count matrix:",
count_matrix.apply(lambda x: x == 0, axis=0).sum(axis=0).sum(),
)
Counts of zeros in each column:
SH_TS_BC_C1 0
SH_TS_BC_C11 0
SH_TS_BC_C15 0
SH_TS_BC_C3 3
SH_TS_BC01 5
..
UNC_TGS_BC_9m 1
UNC_TGS_BC_Y23 4
UNC_TGS_BC_Y23_R1 9
UNC_TGS_BC_Y90 2
UNC_TGS_BC_Y90_R1 3
Length: 483, dtype: int64
Total zeros in count matrix: 3150
np_matrix = np.array(count_matrix.iloc[:, :])
print("NumPy matrix shape:", np_matrix.shape)
print("Counts of zeros in each column:", (np_matrix == 0).sum(axis=0))
print("Total Counts of zeros in matrix:", (np_matrix == 0).sum(axis=0).sum())
NumPy matrix shape: (80, 483)
Counts of zeros in each column: [ 0 0 0 3 5 4 3 2 6 6 2 4 4 4 1 4 0 0 8 0 16 1 0 1
3 37 2 4 2 4 4 1 2 2 3 3 2 6 17 10 11 5 8 2 3 0 1 1
5 3 4 2 2 4 1 1 18 2 4 1 8 0 0 3 1 1 5 6 2 3 6 4
6 24 15 4 7 36 3 18 10 28 2 20 2 7 3 3 2 9 18 3 3 4 8 1
11 3 5 13 6 20 6 16 17 3 17 2 6 2 5 18 9 3 14 2 13 2 0 0
0 1 3 8 11 1 8 17 2 4 6 17 19 12 5 21 3 36 2 1 2 5 32 1
0 1 0 1 0 7 2 1 1 6 11 1 1 0 3 7 0 2 9 11 6 5 4 1
16 6 6 1 1 1 2 2 3 1 3 0 2 1 3 1 1 2 3 6 4 2 34 1
1 2 2 12 24 20 3 2 13 2 2 21 10 1 8 3 18 19 4 23 35 1 2 3
10 1 0 0 6 8 0 0 1 1 0 1 1 1 1 1 2 15 4 6 2 12 3 11
3 3 17 8 8 1 14 6 18 7 0 1 16 8 16 3 7 6 6 13 24 15 3 6
15 1 0 2 1 16 1 9 2 1 1 0 0 2 9 0 0 4 1 0 11 1 10 3
0 7 0 3 0 2 24 2 8 1 2 0 18 4 17 26 22 9 20 21 1 7 10 2
13 1 23 5 17 8 7 19 13 28 25 24 21 11 4 4 3 15 9 2 12 11 16 1
0 8 7 9 11 12 6 5 0 7 14 12 7 22 4 7 13 19 13 6 0 3 2 15
11 6 1 11 0 8 9 3 9 19 1 5 5 7 1 4 6 5 7 2 3 2 5 7
13 17 0 1 6 4 0 0 1 2 3 9 2 8 11 2 36 7 1 0 3 8 2 0
2 24 4 6 10 2 1 3 12 8 7 6 23 1 10 3 0 15 3 12 3 6 2 8
8 26 21 13 8 5 16 18 19 9 2 3 2 4 3 2 2 9 1 1 0 21 1 1
1 0 0 1 2 1 1 6 1 1 0 6 5 2 2 8 1 4 2 4 2 4 1 4
9 2 3]
Total Counts of zeros in matrix: 3150
Apply log2 transformation: log2(counts + 1)
# Convert count_matrix to log2
count_matrix_log2 = count_matrix.apply(lambda x: np.log2(x + 1), axis=0)
print(
"Log2 transformed count matrix (first 5 rows & columns):\n",
count_matrix_log2.iloc[:5, :5],
)
print("Log2 transformed count matrix info:\n", count_matrix_log2.info())
print(
"Log2 transformed count matrix descriptive statistics (first 5 samples):\n",
count_matrix_log2.iloc[:, :5].describe(),
)
Log2 transformed count matrix (first 5 rows & columns):
SH_TS_BC_C1 SH_TS_BC_C11 SH_TS_BC_C15 SH_TS_BC_C3 SH_TS_BC01
Gene
ACTR3B 4.700440 9.129283 7.857981 5.491853 4.584963
ANLN 7.442943 11.273796 9.792790 8.326429 2.807355
APOBEC3G 6.845490 13.104435 11.441907 9.071462 5.584963
AURKA 9.292322 12.871328 11.466586 9.142107 3.906891
BAG1 8.312883 12.538674 11.203348 8.108524 4.754888
<class 'pandas.core.frame.DataFrame'>
Index: 80 entries, ACTR3B to VEGFA
Columns: 483 entries, SH_TS_BC_C1 to UNC_TGS_BC_Y90_R1
dtypes: float64(483)
memory usage: 302.5+ KB
Log2 transformed count matrix info:
None
Log2 transformed count matrix descriptive statistics (first 5 samples):
SH_TS_BC_C1 SH_TS_BC_C11 SH_TS_BC_C15 SH_TS_BC_C3 SH_TS_BC01
count 80.000000 80.000000 80.000000 80.000000 80.000000
mean 7.777790 12.376603 10.938512 8.774008 4.519537
std 2.917182 2.540082 2.458104 2.686284 2.755204
min 1.000000 3.807355 2.807355 0.000000 0.000000
25% 5.894663 10.780910 9.436778 7.698564 2.241446
50% 8.049386 12.419880 10.895191 9.049684 4.523562
75% 9.732161 13.931187 12.398038 10.158550 6.661454
max 13.957918 17.866844 16.286919 13.597820 11.200286
4. Statistical Analysis
Calculate mean and std for each gene within each group
Compute Z-scores within each sample group
Calculate Z-score differences between groups & SD of the Z-score difference
print(
f"iweak_cols mask: {iweak_cols[:10]} \
\nistrong_cols mask: {istrong_cols[:10]} \
\nTotal number of iweak columns: {len(count_matrix_log2.columns[iweak_cols])} \
\nTotal number of istrong columns: {len(count_matrix_log2.columns[istrong_cols])}"
)
iweak_cols mask: [False False False False False False False False False False]
istrong_cols mask: [False False False False False False False False False False]
Total number of iweak columns: 54
Total number of istrong columns: 37
Calculate mean and std for each gene within each group
# Mean log2 value for iweak and istrong samples
mean_iweak = count_matrix_log2.iloc[:, iweak_cols].mean(axis=1)
mean_istrong = count_matrix_log2.iloc[:, istrong_cols].mean(axis=1)
print("Mean log2 value for iweak samples (first 5 rows):\n", mean_iweak.head())
print("Mean log2 value for istrong samples (first 5 rows):\n", mean_istrong.head())
Mean log2 value for iweak samples (first 5 rows):
Gene
ACTR3B 7.860318
ANLN 8.870121
APOBEC3G 8.839295
AURKA 9.873015
BAG1 8.818064
dtype: float64
Mean log2 value for istrong samples (first 5 rows):
Gene
ACTR3B 6.994971
ANLN 6.953521
APOBEC3G 10.527763
AURKA 9.192108
BAG1 9.029261
dtype: float64
# Mean log2 value for iweak and istrong samples
std_iweak = count_matrix_log2.iloc[:, iweak_cols].std(axis=1)
std_istrong = count_matrix_log2.iloc[:, istrong_cols].std(axis=1)
print(
"Standard deviation log2 value for iweak samples (first 5 rows):\n",
std_iweak.head(),
)
print(
"Standard deviation log2 value for istrong samples (first 5 rows):\n",
std_istrong.head(),
)
Standard deviation log2 value for iweak samples (first 5 rows):
Gene
ACTR3B 1.995958
ANLN 1.554415
APOBEC3G 2.074605
AURKA 1.191852
BAG1 2.199874
dtype: float64
Standard deviation log2 value for istrong samples (first 5 rows):
Gene
ACTR3B 2.319413
ANLN 2.843593
APOBEC3G 1.321013
AURKA 2.242906
BAG1 2.019663
dtype: float64
print(count_matrix_log2.shape, mean_iweak.shape)
(80, 483) (80,)
Compute Z-scores within each sample group
# Calculate Z-scores for iweak samples
## Numpy like operations
z_iweak = (
count_matrix_log2.iloc[:, iweak_cols] - mean_iweak.values.reshape(-1, 1)
) / std_iweak.values.reshape(-1, 1)
print("Z-scores for iweak samples (first 5 rows):\n")
print(z_iweak.iloc[:5, :5])
Z-scores for iweak samples (first 5 rows):
SH_TS_BC111 SH_TS_BC112 SH_TS_BC113 SH_TS_BC133 SH_TS_BC134
Gene
ACTR3B -0.798379 -0.735487 -0.920836 0.458160 -0.943426
ANLN -1.188713 -2.057170 0.161741 0.227662 -5.063077
APOBEC3G -4.260712 0.307752 -0.569305 0.264495 -1.499743
AURKA -1.109424 -1.367487 0.168751 1.121313 -2.346097
BAG1 -4.008440 -0.285580 -0.185771 0.582435 -1.021972
Z-score calculation using pandas built-in sub
and div
functions:
sub()
and div()
:
.sub()
: is pandas’ method to perform element-wise subtraction.div()
: is pandas’ method to perform element-wise divisionAccepts a value, series, dataframe to subtract
axis=
- specify the axis along which to perform the operationThis is the preferred method as it is more readable and less error-prone
Note: When you perform operations like subtraction using the sub() method, pandas typically follows its broadcasting rules to align indices. This can sometimes lead to unintended behavior if the shapes of the Series or DataFrames don’t match.
# Calculate Z-scores for iweak samples, using `sub` and `div`
z_iweak = (
count_matrix_log2.iloc[:, iweak_cols].sub(mean_iweak, axis=0).div(std_iweak, axis=0)
)
print("Z-scores for iweak samples (first 5 rows):\n", z_iweak.iloc[:5, :5])
print(
"\n Z-scores for iweak samples Rows (Genes) and Columns (samples):", z_iweak.shape
)
Z-scores for iweak samples (first 5 rows):
SH_TS_BC111 SH_TS_BC112 SH_TS_BC113 SH_TS_BC133 SH_TS_BC134
Gene
ACTR3B -0.798379 -0.735487 -0.920836 0.458160 -0.943426
ANLN -1.188713 -2.057170 0.161741 0.227662 -5.063077
APOBEC3G -4.260712 0.307752 -0.569305 0.264495 -1.499743
AURKA -1.109424 -1.367487 0.168751 1.121313 -2.346097
BAG1 -4.008440 -0.285580 -0.185771 0.582435 -1.021972
Z-scores for iweak samples Rows (Genes) and Columns (samples): (80, 54)
# Calculate Z-scores for istrong samples, using `sub` and `div`
z_istrong = (
count_matrix_log2.iloc[:, istrong_cols]
.sub(mean_istrong, axis=0)
.div(std_istrong, axis=0)
)
print("Z-scores for iweak samples (first 5 rows):\n", z_istrong.iloc[:5, :5])
print(
"\n Z-scores for iweak samples Rows (Genes) and Columns (samples):", z_istrong.shape
)
Z-scores for iweak samples (first 5 rows):
SH_TS_BC119 SH_TS_BC150 SH_TS_BC151 SH_TS_BC153 SH_TS_BC165
Gene
ACTR3B 0.251123 0.745681 0.473303 1.138150 0.310111
ANLN 0.963558 -0.046892 0.846965 1.049066 0.320252
APOBEC3G 0.135300 0.285290 1.097819 1.213074 0.762314
AURKA 0.438189 0.594938 0.731543 0.743076 0.704167
BAG1 0.293509 1.148990 0.992325 0.364224 -0.072595
Z-scores for iweak samples Rows (Genes) and Columns (samples): (80, 37)
Calculate Z-score differences between groups & SD of the Z-score difference
Calculate mean z-score for each gene in two groups
Calculate z-score difference of each group
Calculate the SD
# Calculate mean z-score for each gene in two groups
# Calcualte z-score difference of each group
z_diff = z_istrong.mean(axis=1) - z_iweak.mean(axis=1)
print("Shape of z_diff:", z_diff.shape)
print("Z-score difference (istrong - iweak) (first 5 rows):\n", z_diff.head())
Shape of z_diff: (80,)
Z-score difference (istrong - iweak) (first 5 rows):
Gene
ACTR3B 1.345857e-15
ANLN -1.994845e-17
APOBEC3G -7.123653e-16
AURKA -4.685386e-16
BAG1 1.760576e-15
dtype: float64
# SD of z-score difference
z_diff_std = z_diff.std()
print("Type of z_diff_std:", type(z_diff_std))
print("Standard deviation of z-score difference:", z_diff_std)
Type of z_diff_std: <class 'numpy.float64'>
Standard deviation of z-score difference: 1.4956431674223958e-15
5. Ranking Genes
Calculate Z-ratio: difference / std_difference
Rank genes by Z-ratio (highest to lowest)
Calculate Z-ratio: Z-score difference / std_difference
z_score_ratios = z_diff / z_diff_std
print("Shape of z_score_ratios:", z_score_ratios.shape)
print("Z-score ratios (istrong - iweak) (first 5 rows):\n", z_score_ratios.head())
Shape of z_score_ratios: (80,)
Z-score ratios (istrong - iweak) (first 5 rows):
Gene
ACTR3B 0.899852
ANLN -0.013338
APOBEC3G -0.476294
AURKA -0.313269
BAG1 1.177136
dtype: float64
Rank genes by Z-ratio (highest to lowest)
z_score_ratios.sort_values(ascending=False)
Gene
GAPDH 3.143055
CCL5 2.748459
CD68 2.222521
HLA-DMA 2.007418
MKI67 1.917063
...
MDM2 -1.469024
UBE2C -1.492727
PSMC4 -1.776938
TYMS -2.403611
NUF2 -2.414980
Length: 80, dtype: float64
import matplotlib.pyplot as plt
z_score_ratios.sort_values(ascending=False).plot(
kind="bar",
figsize=(20, 5),
title="Z-score ratios (istrong - iweak)",
xlabel="Genes",
ylabel="Z-score ratios",
)
<Axes: title={'center': 'Z-score ratios (istrong - iweak)'}, xlabel='Genes', ylabel='Z-score ratios'>
