I have been using Seurat v4 Reference Mapping align some query scRNAseq datasets that come from IPSC-derived cells that were subject to several directed cortical differentiation protocols at multiple timepoints. The reference dataset I made by merging several individual fetal cortical sample datasets that I had annotated based on their unsupervised cluster DEGs (following this vignette using the default parameters).
I am interested in seeing which protocol produces cells most similar to the cells found in the fetal datasets as well as which fetal timepoints the query datasets tend to map to. I understand that the MappingScore() function can show me query cells that aren't well represented in the reference dataset, so I figured that these scores could tell me which datasets are most similar to the reference dataset. However, in comparing the violin plots of the mapping scores for a query dataset from one of the differentiation protocols to a query dataset that contains just pluripotent cells it looks like there are cells with high mapping scores found in both cases (see attached images) even though really only the differentiated cells should have cells closely resembling the fetal cortical tissue cells. I attached the code as a .txt file.
My question is whether or not the mapping score can be used as an absolute measurement of query to reference dataset similarity or if it is always just a relative measure where the high and low thresholds are set by the query dataset. If the latter, what alternative functions might I use here to get information about absolute similarity?
Thanks.
Attachments: Pluripotent Cell Mapping Score Differentiated Cell Mapping Score Code Used For Mapping