This is what you need for getting PAM50 prediction for each sample. It takes RNA-seq values (raw or CPM) and returns PAM50 subtype for each sample.
PAM50(df, cutoff = 0)
df | dataframe, rows are genes columns are samples. Row names has to be EntrezID. |
---|---|
cutoff | numeric (0-1, default = 0), threshold value for calling the prediction. 0 means no cutoff, predicted subtype is whichever subtype gives the highest probability. 0.8 means the predicted subtype has to have probability > 0.8, otherwise 'NA' is the label. |
library(Biobase) library(tidyverse) library(pamr) df <- data.frame(assayData(es)$exprs) df <- df[,1:10] ## select 10 samples annotation <- fData(es) rownames(df) <- annotation$EntrezID df_pred <- PAM50(df) knitr::kable(df_pred)#> #> #> |PAM50 | Basal| Her2| LumA| LumB| Normal| #> |:------|---------:|---------:|---------:|---------:|---------:| #> |LumA | 0.0000000| 0.0000022| 0.9915946| 0.0003167| 0.0080865| #> |LumA | 0.0000000| 0.0027289| 0.7926934| 0.0013944| 0.2031832| #> |LumA | 0.0000000| 0.0000287| 0.9973988| 0.0016445| 0.0009280| #> |LumB | 0.0000000| 0.0010133| 0.0497278| 0.9492588| 0.0000001| #> |LumB | 0.0000000| 0.0026159| 0.0502811| 0.9471006| 0.0000024| #> |Basal | 1.0000000| 0.0000000| 0.0000000| 0.0000000| 0.0000000| #> |Normal | 0.0044746| 0.0000000| 0.0000031| 0.0000000| 0.9955223| #> |LumA | 0.0000000| 0.0000032| 0.9806685| 0.0000561| 0.0192723| #> |LumA | 0.0000000| 0.0000008| 0.9879260| 0.0001809| 0.0118923| #> |LumA | 0.0000000| 0.0000004| 0.9924206| 0.0001046| 0.0074743|