Fig. 3
From: Predicting MHC-I ligands across alleles and species: how far can we go?

Binding site similarity determines MHC-I ligand prediction accuracy. A Relationship between the accuracy of MHC-I ligand predictions (AUC in the LOA cross-validation) and binding site distance to the closest allele with known ligands. Regression line and Pearson correlation coefficients were added to the plot. B Boxplots of binding site distances to the closest allele with known ligands for all known alleles in different species groups. The numbers above each boxplot show the percentage of MHC-I sequences in each group reaching a binding site distance lower than 0.1 (the blue dashed line). Numbers in parentheses indicate the total number of MHC-I alleles with available sequences in each species group. Cyan dots indicate MHC-I alleles with known ligands. C Examples of different scenarios characterizing alleles with binding site distances larger than 0.1. The amino acid frequency for the binding site positions for alleles with known ligands is shown in the middle. An example of an allele without known ligands and having new amino acids (i.e., unseen among alleles with known ligands, written in light blue) in its binding site is shown above. An example of an allele without ligands and having a different arrangement of amino acids is shown below (with amino acids non-conserved in its closest allele with known ligands indicated in green). B-pocket positions are marked in dark blue and F-pocket positions in green. D Stacked barplots showing the percentage of alleles with binding site distance < 0.1 (orange), alleles with binding site distance ≥ 0.1 and new amino acids at some binding site positions (light blue), and alleles with binding site distance ≥ 0.1 and new arrangements of amino acids in their binding site (green). E Frequency of the new amino acids in species where all MHC-I alleles have new amino acids compared to MHC-I alleles with known ligands (i.e., Salmonids, Gallus, and Suids). F Representative 3D structure of the MHC-I binding site (HLA-A*01:01 in gray in complex with EADPTGHSY in yellow, PDB: 1W72), highlighting several positions with low conservation across species (see panel E); black dashed lines are the distances between these positions on the MHC structure and their closest residue on the peptide