Platform Updates

VarSome API Recommendations

By Richard Meyer on May, 4 2021

Stay up to date

Back to main Blog
Richard Meyer


The VarSome API is an incredibly simple and powerful tool allowing a developer to instantly access the 100+ databases integrated in VarSome along with automated ACMG or AMP classifications.


It is possible to annotate whole exomes or even whole genomes extremely efficiently and at low cost using the VarSome API, with options to add VarSome’s automated ACMG & AMP classifications, or pull in additional data from specific databases of interest.


The most cost effective results will be achieved by only annotating a subset of variants: coding variants or those close to a canonical splice-point (flanking within 10bp for example). You can use tools such as bedtools combined with a GFF3 file for your genome of interest (


The VarSome API includes a filter to automatically remove high frequency variants.


In our tests we annotated a sample exome, then restricted the annotation to only coding/splicing variants and then finally filtered the remaining variants using the ACMG BA1 benign frequency threshold of 5%.

  Variants ACMG ACMG +AMP All Data
Coding or flanking 24 954 13.6 KB/variant 21.1 KB/variant 131 KB/variant
Frequency < 5% 4 205 11.5 KB/Variant 17.9 KB/variant 161 KB/Variant

variants usually have substantially less data each, and intergenic variants even less.


Results will vary depending on your pipeline and specific use-case, which additional databases you require, annotating tumour samples or multiple family members. You can of course cache previous results to avoid re-annotating common variants in a family group or cohort.


Please refer to your Saphetor sales consultant for exact pricing details which may vary over time. As of March 2021, annotating a whole exome using ACMG would incur costs of approximately $15 (€12, CHF 13). Furthermore API pricing is degressive and reduces the more you use it.


Currently, the API is priced based on the amount of data exchanged. All data is returned in the JSON format. This document aims to explain some of the intricacies and how to keep the costs extremely low.


The following gives some recommendations on how to use the API efficiently, the full reference guide is available at


The API gives full control to the user: in order to keep costs reasonable or very low, you need to decide which data you actually need. This is controlled by the following options:


  • add-ACMG-annotation: if True, the responses will include the minimum set of databases required for our ACMG annotator.
  • add-AMP-annotation: if this option is enabled, it will add all the cancer databases required for AMP, on top of those used by ACMG.
  • add-all-data: use this flag sparingly as it will add all possible annotations from all sources. This can be useful to find out which sources you might like to include (or exclude) but will incur higher costs.
  • expand-pubmed-articles: if set, this will add a dictionary containing all the publications referenced in the annotation, including title, abstract, authors, journal, identifiers etc.
  • allele-frequency-threshold: this is a filter that can dramatically reduce the volume of annotations: any variant whose gnomAD genomes allele frequency is greater than the provided threshold will not be annotated.
  • add-source-databases: this takes a comma separated list of database names as its argument, and will add these to the annotation. This is best use in conjunction with the preceding flags if you require some additional detail.


PubMed Articles

Only use “expand-pubmed-articles” on a very small set of variants, the large amount of text in abstracts will rapidly make this option unviable. We recommend using the API call “pubmed_info” instead on a case-by-case basis and then caching the results in your own database - for example:

Allele Frequency Threshold

This is a very useful tool, set it at 0.05 for example to remove all BA1 variants. We find that on average this reduces the amount of data by a factor of 4x to 6x.

Data Sizes

We annotated a whole exome to measure the current data-sizes returned by the API:

  Variants ACMG ACMG +AMP All Data
Coding or flanking 24 954 13.6 KB/variant 21.1 KB/variant 131 KB/variant
Frequency < 5% 4 205 11.5 KB/Variant 17.9 KB/variant 161 KB/Variant
  • Coding variants return on average 2x as much data as non-coding variants.
  • Enabling AMP incurs a 60% overhead over ACMG as many additional cancer databases are included.
  • Enabling “all data” increases the JSON by a factor of 10 or more and should only be used extremely conservatively on the subset of variants of interest.
  • Expanding publications to retrieve the title & abstract should be done externally to annotation, maybe specifically on the subset of variants of interest. Furthermore the publication data should be cached as it will not change.


We recommend that you cache frequently used data:

  • Caching variant annotations is possibly not so useful as new information or improvements to our classification algorithms could be missed unless periodically refreshed.
  • Variants: it may be useful to know that you have seen a variant before in another sample for which you can use the unique 64-bit “variant_id” assigned by VarSome to each variant (equivalent variants are given the same variant_id).
  • Publications: it is definitely worthwhile building your own cache of publication information, you can either download these directly from NCBI PubMed, or alternatively use the expanded data from VarSome.
  • Genes: when storing annotations for a given sample, you may like to extract the gene annotations which will be identical for every variant in a given gene. We use this for our own VarSome Clinical platform as it can reduce space by 75%.


We hope these guidelines are helpful to extract maximum value from the hugely powerful VarSome API. Do let us know if there’s any further information we can provide, please address any questions to


Kind regards,

The VarSome Team.

Submit a Comment

Stay up to date