Every day I feel like we’re getting closer to the answer of “WTF are all of these unmatched spectra?!?!” the answers may not be the most fun in the world and here is a great new example!
I passed by this paper a couple of times because the title just didn’t catch my attention, but then it popped up in a different web interface with the abstract graphic. In that graphic you see they did N-terminal enrichment and RiboSeq (!?!) and then proteomics.
RiboSeq is another complicated way for genomics people to get to protein level data without dealing with frustrating mass spectrometrists. Basically you stop ribsosomes in their tracks then degrade all of the transcripts in a cell EXCEPT the ones that are stuck inside the frozen ribosomes. Then you get a picture of what was actively being made into an original proteoform. There are actually smart workflows out there for combining proteomics and RiboSeq data and that’s what this team used.
To simplify their overall matrix, they used N-terminal protein enrichment and they justify their reasoning better than I could summarize here.
What they find is a ton of weird stuff that can only be meaningfully attributed to regions of the genome that are annotated as “noncoding”. Since no one likes a peptide match that comes from the vast majority of the genome (reminder — supposedly only 1% of the human genome encodes for proteins, because that totally makes sense. Why wouldn’t billions of years of selective evolutionary pressure result in 99% stuff that organisms won’t use, though — strangely like 1/2 is complete duplicates and 80% is considered regulatory) they go above and beyond to rule out that other things might be better matches.
What they come up with is a great big pile of things that definitely seem like they are misannotated as noncoding under this context. Now….fair to mention that this is an old cancer cell line and these are strange things, but this still points at some fundamental flaws in the upstream processes that result in those nice and concise FASTA databases that we use.