import rodin
The Rodin
class requires two data files: a features table and a class labels file.
feat_stat
must be set to 'ann'.)To create an object of the Rodin
class using these files, use the following command:
obj=rodin.create_object_csv("./data/features.csv","./data/class_labels.csv",
feat_sep=',',class_sep=',')
Once the Rodin
object is created, it has three attributes:
X
: This attribute holds the intensities data from the samples.
features
: This contains the metabolites features data, including 'mass to charge' ratios and 'retention times'.
samples
: This attribute includes the classes data, which correspond to various characteristics of the samples such as treatment groups, age, etc.
print(obj.samples)
Sample ID Dose Sex 0 N10 Control Female 1 H3 High Female 2 N23 Control Female 3 N21 Control Female 4 H9 High Female .. ... ... ... 66 H15 High Male 67 H16 High Male 68 H21 High Male 69 H27 High Male 70 H22 High Male [71 rows x 3 columns]
The transform
function offers a convenient method for preprocessing data, comprising several key steps:
Imputation: Fills in missing data points within the dataset.
Filtering Features (thresh): Eliminates features that have more than a specified threshold of missing values. This step ensures the quality and reliability of the dataset by removing less informative features.
Data Normalization (norm): Normalizes the data using one of two methods:
Log2 Transformation (log): Applies a log2 transformation to the normalized data. This transformation is useful for stabilizing variance across the dataset and making the data more suitable for linear models and other statistical analyses.
These preprocessing steps are crucial for preparing the data for further analysis, ensuring that it is clean, normalized, and ready for accurate and meaningful interpretation. However, you are able to skip particular step by setting its parameter to None.
obj.transform(thresh=0.5,norm='q')
Number of features filtered: 1972
< Rodin object > dim: 24233 X 71
After the data in the X
attribute has been transformed, statistical tests can be performed. In this basic guide, we will focus on using one-way ANOVA (Analysis of Variance).
To perform one-way ANOVA using the Rodin
class:
samples
attribute as an argument. This column should represent the different groups or classes among your samples.Following the execution of one-way ANOVA, the features
attribute of the Rodin
object will be updated. It will now include data about the statistical tests performed. The p_adj
column displays the p-values adjusted using the Benjamini-Hochberg method.
obj.oneway_anova('Dose')
mz | time | p_value(owa) Dose | p_adj(owa) Dose | |
---|---|---|---|---|
0 | 85.0284 | 38.3 | 0.867775 | 0.929819 |
1 | 85.0285 | 213.1 | 0.003990 | 0.036638 |
2 | 85.0285 | 213.1 | 0.817019 | 0.900804 |
3 | 85.0285 | 69.4 | 0.960532 | 0.979629 |
4 | 85.0285 | 69.4 | 0.006463 | 0.049960 |
... | ... | ... | ... | ... |
26200 | 1245.6035 | 276.0 | 0.329061 | 0.536328 |
26201 | 1253.8981 | 29.2 | 0.000134 | 0.003188 |
26202 | 1255.9144 | 29.4 | 0.000485 | 0.008308 |
26203 | 1274.1295 | 40.1 | 0.154725 | 0.343972 |
26204 | 1274.1842 | 39.9 | 0.008926 | 0.060915 |
24233 rows × 4 columns
The fold_change
function is recommended for pathway analysis as it calculates the log fold change difference between classes in the dataset. The function uses the first class appearing in the specified column of the samples
attribute as the reference
, if it is not provided as parameter.
obj.fold_change('Dose')
mz | time | p_value(owa) Dose | p_adj(owa) Dose | lfc (High vs Control) | lfc (Low vs Control) | lfc (others vs Control) | |
---|---|---|---|---|---|---|---|
0 | 85.0284 | 38.3 | 0.867775 | 0.929819 | -0.092725 | -0.046480 | -0.069603 |
1 | 85.0285 | 213.1 | 0.003990 | 0.036638 | 0.128757 | -0.255070 | -0.063157 |
2 | 85.0285 | 213.1 | 0.817019 | 0.900804 | -0.052215 | 0.080238 | 0.014011 |
3 | 85.0285 | 69.4 | 0.960532 | 0.979629 | -0.042041 | 0.012479 | -0.014781 |
4 | 85.0285 | 69.4 | 0.006463 | 0.049960 | -0.140702 | 0.161327 | 0.010312 |
... | ... | ... | ... | ... | ... | ... | ... |
26200 | 1245.6035 | 276.0 | 0.329061 | 0.536328 | 0.253545 | 0.079541 | 0.166543 |
26201 | 1253.8981 | 29.2 | 0.000134 | 0.003188 | 0.394721 | 0.792212 | 0.593466 |
26202 | 1255.9144 | 29.4 | 0.000485 | 0.008308 | 0.220599 | 0.734591 | 0.477595 |
26203 | 1274.1295 | 40.1 | 0.154725 | 0.343972 | 0.286580 | 0.063715 | 0.175147 |
26204 | 1274.1842 | 39.9 | 0.008926 | 0.060915 | 0.185690 | -0.249075 | -0.031692 |
24233 rows × 7 columns
With the same approach the following tests are available:
ttest('categorical')
pls_da('categorical')
twoway_anova(['cat1','cat2'])
sf_lg('binary') - Logistic Regression
sf_lr('continuis') - Linear Regression
rf_class('categorical') - Random Forest Classifier
rf_regress('continuis') - Random Forest Regressor
Additonal parameteres as degree, moderator or n_estimators could be provided.
Plotly library is used by default, to get static plots set interactive
parameter to False
(except of volcano plot)
Results of statistical tests could be visualized using volcano
function.
obj.volcano(p='p_adj(owa) Dose', effect_size='lfc (High vs Control)', sign_line=0.00001, effect_size_line=[-1,1])
Metabolites of interest could be plotted using boxplot
, violinplot
and regplot
providing the raw index numbers (indexes from the original data are saved in the Rodin
object)
obj.boxplot(rows=[9999,4561],
hue='Dose',category_order=['Control','Low','High'])
The Rodin
class also offers a convenient way to slice data, enhancing the flexibility and specificity of data analysis. When slicing, for example, by a certain criterion such as p-value in the features table, all main attributes of the Rodin
object are updated correspondingly. This means that if you slice the features table based on p-value, the X
attribute (intensities data) will be updated to reflect this change. This feature is particularly useful as it allows for the calculation of principal components on the updated dataset and facilitates the visualization of results with selected features.
Key Analysis Options Post-Slicing:
X
attribute is updated, you can perform PCA on this refined dataset to uncover patterns and relationships.Rodin
class supports other visualization techniques like UMAP (Uniform Manifold Approximation and Projection) and t-SNE (t-Distributed Stochastic Neighbor Embedding).obj_2 = obj[obj.features[obj.features['p_adj(owa) Dose']<0.01]]
obj_2.run_pca()
obj_2.run_umap()
After running dimenional reduction methods n-components will be stored in the class as well in dr attribute
obj_2
< Rodin object > dim: 1518 X 71 dr: pca, umap
obj_2.plot(dr_name='umap',hue='Dose')
obj_2.plot(dr_name='pca',hue='Dose')
To understand the patterns in the data - clustergram
function could be utilitized:
obj_2.clustergram(hue='Dose',standardize='row',width=1050)
The Rodin
class incorporates the mummichog algorithm for pathway analysis, which is particularly useful for identifying metabolic pathways in mass spectrometry data.
Input Requirements:
p_adj(owa)
.lfc (others vs Control)
.positive
or negative
(our lab uses custom modes f_positive
| f_negative
).Output:
analyze_pathways
produces a detailed table containing information about the identified pathways and compounds.uns
attribute of the Rodin
object. This allows for convenient access and further analysis of the pathway data.For a comprehensive understanding of the parameters and functionality of the analyze_pathways
function refer to the dedicated pathways guide.
obj.analyze_pathways('p_adj(owa) Dose','lfc (others vs Control)',mode='f_positive')
Started @ Fri Aug 30 14:54:57 2024 Read 24233 features as reference list. Automatically choosing (p < 0.000100) as significant cutoff. Using 415 features (p < 0.000100) as significant list. Pathway Analysis... Query number of significant compounds = 76 compounds Resampling, 100 permutations to estimate background ...
Got 1692 cpd2mzFeatures Got 3449 ListOfEmpiricalCompounds Got 2198 merged ListOfEmpiricalCompounds Got 1132 final ListOfEmpiricalCompounds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Pathway background is estimated on 11900 random pathway values
pathway | overlap_size | pathway_size | p-value | overlap_EmpiricalCompounds (id) | |
---|---|---|---|---|---|
0 | Aspartate and asparagine metabolism | 8 | 57 | 0.01646920426854886 | E245,E1139,E468,E442,E430,E281,E300,E101 |
1 | Glutamate metabolism | 3 | 11 | 0.016637257373329974 | E245,E442,E101 |
2 | Vitamin H (biotin) metabolism | 2 | 5 | 0.019662213259389966 | E468,E1752 |
3 | Lysine metabolism | 5 | 29 | 0.019914292916561632 | E468,E1054,E667,E281,E1442 |
4 | Nitrogen metabolism | 2 | 6 | 0.027056549869758844 | E442,E300 |
... | ... | ... | ... | ... | ... |
114 | Starch and Sucrose Metabolism | 0 | 18 | 0.3344256785144106 | |
115 | Trihydroxycoprostanoyl-CoA beta-oxidation | 0 | 1 | 0.3344256785144106 | |
116 | Omega-6 fatty acid metabolism | 0 | 8 | 0.3344256785144106 | |
117 | Phytanic acid peroxisomal oxidation | 0 | 8 | 0.3344256785144106 | |
118 | Selenoamino acid metabolism | 0 | 20 | 0.3344256785144106 |
119 rows × 5 columns
obj
< Rodin object > dim: 24233 X 71 dr: pca, umap uns: pathways, compounds
obj.uns['compounds']
input_row | EID | str_row_ion | compounds | compound_names | m/z | retention_time | p_value | statistic | |
---|---|---|---|---|---|---|---|---|---|
0 | 51 | E11 | M+H[1+] | C02218 | 2-Aminoacrylate; Dehydroalanine | 88.0394 | 74.1 | 0.602135 | 0.029470 |
1 | 51 | E14 | M+2H[2+] | C00439 | N-Formimidoyl-L-glutamate | 88.0394 | 74.1 | 0.602135 | 0.029470 |
2 | 52 | E16 | M+2H[2+] | C00437 | N2-Acetyl-L-ornithine | 88.0579 | 68.0 | 0.627157 | 0.106719 |
3 | 53 | E20 | M+H[1+] | C00555 | 4-Aminobutanal; 4-Aminobutyraldehyde; Butyrald... | 88.0758 | 23.8 | 0.640468 | -0.023287 |
4 | 54 | E20 | M+H[1+] | C00555 | 4-Aminobutanal; 4-Aminobutyraldehyde; Butyrald... | 88.0758 | 23.8 | 0.839262 | 0.065401 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1756 | 26030 | E2193 | M+Na[1+] | lnlccoa;CE5162;C16163;lneldccoa | linoleic coenzyme A$trans,cis-2,9-octadecadien... | 1048.3058 | 129.2 | 0.000440 | -0.725758 |
1757 | 26044 | E2190 | M+Na[1+] | C16218;C00510 | trans-Octadec-2-enoyl-CoA$Octadecenoyl-CoA (n-... | 1054.3533 | 287.6 | 0.155874 | -0.213352 |
1758 | 26044 | E2194 | M+H[1+] | CE4843;C02249;CE5150 | cis,cis-11,14-eicosadienoyl-CoA$C20:4-CoA$tran... | 1054.3533 | 287.6 | 0.155874 | -0.213352 |
1759 | 26149 | E2018 | M+Na[1+] | CE4817 | 3(S)-hydroxy-docosa-7,10,13,16,19-all-cis-pent... | 1114.3193 | 269.9 | 0.936672 | 0.045092 |
1760 | 26173 | E2198 | M+Na[1+] | C15613;C17343;C05448 | 3alpha,7alpha,12alpha-Trihydroxy-5beta-cholest... | 1222.4244 | 290.2 | 0.162603 | 0.358422 |
1761 rows × 9 columns
Compounds could be aggregated by pathways using function show_compounds
. Provide cutoff_eids
to reflect the same eids as in pathways table.cutoff_path
is used to filter the pathways of interest.
obj.show_compounds(cutoff_eids=0.0001,cutoff_path=0.017)
str_row_ion | compounds | compound_names | input_row | m/z | retention_time | p_value | statistic | ||
---|---|---|---|---|---|---|---|---|---|
pathway | compound_id | ||||||||
Aspartate and asparagine metabolism | E101 | M+H[1+] | C02356;C01026;C00334;C05145;C01205;C03284 | (S)-2-Aminobutanoate; (S)-2-Aminobutanoic acid... | 354 | 104.0707 | 57.2 | 1.797939e-07 | 0.187342 |
E281 | M+H[1+] | C05936;C00408 | N4-Acetylaminobutanal$L-Pipecolate; Pipecolini... | 1032 | 130.0863 | 72.4 | 6.369287e-15 | -0.161556 | |
E281 | M(C13)+H[1+] | C05936;C00408 | N4-Acetylaminobutanal$L-Pipecolate; Pipecolini... | 1073 | 131.0897 | 76.7 | 7.092228e-11 | -0.158624 | |
E300 | M+H[1+] | C02642;C00152 | N-Carbamoyl-beta-alanine$L-Asparagine | 1122 | 133.0608 | 79.8 | 9.416820e-05 | -0.020385 | |
E430 | M+H[1+] | C01035 | 4-Guanidinobutanoate | 1583 | 146.0923 | 30.8 | 6.932411e-18 | 0.361329 | |
E442 | M+H[1+] | C00064;C05100;C00819 | L-Glutamine$3-Ureidoisobutyrate$D-Glutamine; D... | 1616 | 147.0765 | 76.9 | 1.029164e-06 | 0.053800 | |
E468 | M+H[1+] | C00047 | L-Lysine | 1623 | 147.1127 | 79.6 | 3.517658e-14 | -0.136833 | |
E468 | M(C13)+H[1+] | C00047 | L-Lysine | 1658 | 148.1161 | 77.7 | 8.028457e-12 | -0.159557 | |
E1139 | M+H[1+] | C01239 | N-Acetyl-beta-D-glucosaminylamine | 4248 | 221.1133 | 164.1 | 1.263207e-22 | 0.278673 | |
E245 | M+H[1+] | C00669 | gamma-L-Glutamyl-L-cysteine | 5327 | 251.0697 | 69.1 | 8.833216e-05 | -0.125672 | |
Glutamate metabolism | E101 | M+H[1+] | C02356;C01026;C00334;C05145;C01205;C03284 | (S)-2-Aminobutanoate; (S)-2-Aminobutanoic acid... | 354 | 104.0707 | 57.2 | 1.797939e-07 | 0.187342 |
E442 | M+H[1+] | C00064;C05100;C00819 | L-Glutamine$3-Ureidoisobutyrate$D-Glutamine; D... | 1616 | 147.0765 | 76.9 | 1.029164e-06 | 0.053800 | |
E245 | M+H[1+] | C00669 | gamma-L-Glutamyl-L-cysteine | 5327 | 251.0697 | 69.1 | 8.833216e-05 | -0.125672 |
After conducting pathway analysis in the Rodin
class, users can visualize the results through barplots, violin or reg plots based on Entity IDs (EIDs) of metabolites or pathways names. By default, these plots will only display statistically significant metabolites (0.05 by default), omitting those not meeting the significance threshold. This focused approach ensures clearer and more meaningful visualizations, aiding in the interpretation of the pathway analysis outcomes.
obj.violinplot(eids=['E468','E1752 '],hue='Dose',significant=0.0001,category_order=['Control','Low','High'])
obj.boxplot(pathways=['Vitamin H (biotin) metabolism'],
hue='Dose',significant=0.001,category_order=['Control','Low','High'],grid_dim=(1,3),figsize=(10,5))
Saving and Importing Objects in Rodin
Saving the Object:
To save the current state of a Rodin
object, use the save
method. This method allows you to store the object for future use, preserving all its data and attributes.
obj.save('path_to_save.pkl')
Importing the Object: If you need to continue working with a previously saved Rodin object, you can easily import it back into your environment using the import_object method. This method reloads the object with all its data and attributes intact.
obj = rodin.import_object('path_to_saved_object.pkl')