Sampling, DNA preparation and sequencing

Stockholm

Samples LOW002, LOW003, LOW006, LOW007, LOW008 and PON012 were processed at the Archaeological Research Laboratory at Stockholm University, Sweden, following methods previously described8. In brief, this involved extracting DNA by incubating the bone powder for 24 h at 37 °C in 1.5 ml of digestion buffer (0.45 M EDTA (pH 8.0) and 0.25 mg ml–1 proteinase K), concentrating supernatant on Amicon Ultra-4 (30-kDa molecular weight cut-off (MWCO)) filter columns (MerckMillipore) and purifying on Qiagen MinElute columns. Double-stranded Illumina libraries were prepared using the protocol outlined in ref. 48, with the inclusion of USER enzyme and the modifications described in ref. 49.

Samples 367, PDM100, Taimyr-1 and Yana-1 were processed at the Swedish Museum of Natural History in Stockholm, Sweden, following previously described methods8. In brief, this involved extracting DNA using a silica-based method with concentration on Vivaspin filters (Sartorius), according to a protocol optimized for recovery of ancient DNA50. Double-stranded Illumina libraries were prepared using the protocol outlined in ref. 48, with the inclusion of USER enzyme.

Samples ALAS_024, VAL_033, ALAS_016, VAL_008, HMNH_007, HMNH_011, VAL_050, VAL_005, DS04, VAL_037, VAL_012, VAL_011, VAL_18A, IN18_016 and IN18_005 were processed at the Swedish Museum of Natural History in Stockholm, Sweden, following previously described methods for permafrost bone and tooth samples51. In brief, this involved DNA extraction using the methodology of ref. 52 and double-stranded Illumina library preparation as described in ref. 48, with dual unique indexes and the inclusion of USER enzyme. Between eight and ten separate PCR reactions with unique indexes were carried out for each sample to maximize library complexity. The libraries were sequenced alongside samples HOV4, AL2242, AL2370, AL2893, AL3272 and AL3284 across three Illumina NovaSeq 6000 lanes with an S4 100-bp paired-end set-up at SciLifeLab in Stockholm.

Potsdam

Samples JAL48, JAL65, JAL69, JAL358, AH574, AH575 and AH577 were processed at the University of Potsdam. Pre-amplification steps (DNA extraction and library preparation) were conducted in separated laboratory rooms specially equipped for the processing of ancient DNA. Amplification and post-amplification steps were performed in different laboratory rooms. DNA was extracted from bone powder (29–54 mg) following a protocol specially adapted to recover short DNA fragments52. Single-stranded double-indexed libraries were built from 20 µl of DNA extract according to the protocol in ref. 53. The libraries were sequenced on an HiSeq X platform at SciLifeLab in Stockholm.

Tübingen/Jena

Samples JK2174, JK2175, JK2179, JK2181, JK2183, TU144, TU148, TU839 and TU840 were processed at the University of Tübingen, with DNA extraction and pre-amplification steps undertaken in clean room facilities and post-amplification steps performed in a separate DNA laboratory. Both laboratories fulfil standards for work with ancient DNA54,55. All surfaces of tooth and bone samples were initially UV irradiated for 30 min, to minimize the potential risk of modern DNA contamination. Subsequently, DNA was extracted by applying a well-established guanidine silica-based protocol for ancient samples52. Illumina sequencing libraries were prepared by using 20 µl of DNA extract per library48; afterwards, dual barcodes (indexes) were chemically added to the prime ends of the libraries56. For the samples from Auneau (TU839 and TU840), five sequencing libraries each were prepared; for all other samples processed in Tübingen, three sequencing libraries each were prepared. To detect potential contamination of the chemicals, negative controls were conducted for extraction and library preparation. After preparation of the sequencing libraries, DNA concentration was measured with qPCR (Roche LightCycler) using corresponding primers48. The DNA concentration was given by the copy number of the DNA fragments in 1 µl of the sample.

Amplification of the indexed sequencing libraries was performed using Herculase II Fusion under the following conditions: 1× Herculase II buffer, 0.4 µM IS5 primer and 0.4 µM IS6 primer48, Herculase II Fusion DNA polymerase (Agilent Technologies), 0.25 mM dNTPs (100 mM; 25 mM each dNTP) and 0.5–4 µl barcoded library as template in a total reaction volume of 100 µl. The applied amplification thermal profile was processed as follows: initial denaturation for 2 min at 95 °C; denaturation for 30 s at 95 °C, annealing for 30 s at 60 °C and elongation for 30 s at 72 °C for 3 to 20 cycles; and a final elongation step for 5 min at 72 °C. Thereafter, the amplified DNA was purified using a MinElute purification step and DNA was eluted in 20 µl TET. The concentration of the amplified DNA sequencing libraries was measured using a Bioanalyzer (Agilent Technologies) and a DNA1000 lab chip from Agilent Technologies.

The sequencing libraries were sequenced on an Illumina HiSeq 4000 platform at the Max Planck Institute for Science of Human History in Jena. The samples from Auneau (TU839 and TU840) were paired-end sequenced applying 2 × 50 + 8 + 8 cycles. All other libraries prepared in Tübingen were single-end sequenced using 75 + 8 + 8 cycles.

Oxford

Samples AL2657, AL2541, AL2741, AL2744, AL3185, AL2350, CH1109, AL2370, AL3272 and AL3284 were processed at the dedicated ancient DNA facility at the PalaeoBARN laboratory at the University of Oxford, following methods described previously8. In brief, double-stranded libraries were constructed following the protocol in ref. 48. These libraries were sequenced on a HiSeq 2500 (AL2657, AL2541, AL2741, AL2744) or a HiSeq 4000 (AL3185, AL2350, CH1109) instrument at the Danish National Sequencing Center or on a NextSeq 550 instrument (AL2741) at the Natural History Museum of London. For samples AL2370, AL3272 and AL3284, between six and eight separate PCR reactions with unique indexes were carried out on their libraries and they were sequenced alongside samples HOV4, VAL_18A and IN18_016 on an Illumina NovaSeq 6000 lane with an S4 100-bp paired-end set-up at SciLifeLab in Stockholm.

Copenhagen

Samples CGG13, CGG17, CGG19, CGG20, CGG21, CGG25, CGG26, CGG27, CGG28, CGG34, Tumat1 and IRK were processed at the GLOBE Institute, University of Copenhagen. All pre-PCR work was performed in ancient DNA facilities following ancient DNA guidelines57. The details of extraction, library construction and sequencing for the samples with CGG codes are described in ref. 21, in relation to the publication of mitochondrial data from these specimens. The Tumat1 sample was processed following the exact same protocol. In brief, DNA extraction was performed using a buffer containing urea, EDTA and proteinase K50, double-stranded libraries were prepared with NEBNext DNA Sample Prep Master MixSet 2 (E6070S, New England Biolabs) and Illumina-specific adaptors48, and sequencing was performed on an Illumina HiSeq 2500 platform using 100-bp single-read chemistry. For the IRK sample, DNA was extracted from three subsamples and purified as described in ref. 21. The three DNA extracts and the purified pre-digest of one subsample were incorporated into double-stranded libraries following the BEST protocol58, with the modifications described in ref. 59, and sequenced on a BGISEQ-500 platform using 100-bp single-read chemistry.

Santa Cruz

Samples SC19.MCJ017, SC19.MCJ015, SC19.MCJ010 and SC19.MCJ014 were processed at the UCSC Paleogenomics Lab and were provided by the Yukon Government Paleontology program. All pre-PCR work was performed in a dedicated ancient DNA facility at the University of California, Santa Cruz, following standard ancient DNA methods60. Subsamples (250–350 mg) were sent to the UCI KECK AMS facility for radiocarbon dating, and the remaining amounts were powdered in a Retsch MM400 for extraction. For each sample, ~100 mg of powder was treated with a 0.5% sodium hypochlorite solution before extraction to remove surface contaminants61 and then combined with 1 ml lysis buffer for extraction, following the protocol in ref. 52. Samples were processed in parallel with a negative control. We quantified the extracts using a Qubit 1× dsDNA HS Assay kit (Q33231) before preparing libraries. We prepared single-stranded libraries following the protocol in ref. 62 and amplified the libraries for 9–16 cycles as informed by qPCR. After amplification, we cleaned the libraries using a 1.2× SPRI bead solution and pooled them to an equimolar ratio for in-house shallow quality-control sequencing on a NextSeq 550 paired-end 75-bp run. We then sent the libraries to Fulgent Genetics for deeper sequencing on two paired-end 150-bp lanes on a HiSeq X instrument.

Vienna

Sample HOV4 was processed at the Department of Anthropology, University of Vienna. The sample is a canine tooth, which after sequencing was determined to derive from a dhole (Cuon alpinus). DNA was extracted from its cementum using the methods described in ref. 63 with a modified incubation time of ~18 h. The library was prepared according to the protocol in ref. 48 with the modifications from ref. 64. Five separate PCR reactions with unique indexes were carried out on the library and were sequenced alongside samples VAL_18A, IN18_016, AL2242, AL2370, AL2893, AL3272 and AL3284 on an Illumina NovaSeq 6000 lane with an S4 100-bp paired-end set-up at SciLifeLab in Stockholm.

An overview of all samples and their associated metadata is available in Supplementary Data 1.

Genome sequence data processing

For paired-end data, read pairs were merged and adaptors were trimmed using SeqPrep (https://github.com/jstjohn/SeqPrep), discarding reads that could not be successfully merged. Reads were mapped to the dog reference genome canFam3.1 using BWA aln (v.0.7.17)65 with permissive parameters, including a disabled seed (-l 16500 -n 0.01 -o 2). Duplicates were removed by keeping only one read from any set of reads that had the same orientation, length and start and end coordinates. For sample Taimyr-1, previously published data13 were merged with newly generated data. Data from samples processed in Copenhagen were processed as described previously66 except that they were also mapped to canFam3.1. Post-mortem damage was quantified using PMDtools (v0.60)67 with the ‘–first’ and ‘–CpG’ arguments.

Genotyping and integration with previously published genomes

To construct a comparative dataset for population genetic analyses, we started from a published variant call set compiling 722 modern dog, wolf and other canid genomes from multiple previous studies (NCBI BioProject accession PRJNA448733)40. To this, we added additional modern whole genomes from other studies: 4 African golden wolves and 15 Nigerian village dogs (Genome Sequence Archive (http://gsa.big.ac.cn/), accession PRJCA000335)68, 12 Scandinavian wolves (European Nucleotide Archive accession PRJEB20635)69, 9 North American wolves and coyotes (European Nucleotide Archive accession PRJNA496590)25 and 8 other canids (African hunting dog, dhole, Ethiopian wolf, golden jackal, Middle Eastern grey wolves) (European Nucleotide Archive accession PRJNA494815)22. Reads from these genomes were mapped to the dog reference genome using bwa mem (version 0.7.15)70, marked for duplicates using Picard Tools (v2.21.4) (http://broadinstitute.github.io/picard), genotyped at the sites present in the above dataset using GATK HaplotypeCaller (v3.6)71 with the ‘-gt_mode GENOTYPE_GIVEN_ALLELES’ argument and then merged into the dataset using bcftools merge (http://www.htslib.org/). The following filters were then applied to sites and genotypes across the full dataset: sites with excess heterozygosity (bcftools fill-tags ‘ExcHet’ P value < 1 × 10−6) were removed; indel alleles were removed by setting the genotype of any individual carrying such an allele to missing; genotypes at sites with a depth (taken as the sum of the ‘AD’ VCF fields) less than a third of or more than twice the genome-wide average for the given genome or lower than 5 were set to missing; genotypes containing any allele other than the two highest-frequency alleles at the site were set to missing; allele representation was normalized using bcftools norm; and, finally, sites at which 130 or more individuals had a missing genotype were removed. This resulted in a final dataset of 67.8 million biallelic SNPs. In ancestry analyses (that is, those involving f-statistics), modern wolves were treated as individuals while for modern dogs up to four individuals with the highest sequencing coverage from a given breed were used and combined into populations. A list of the modern genomes used in analyses and their associated metadata is included in Supplementary Data 2.

All ancient genomes were assigned pseudo-haploid genotypes on the variant sites in the above dataset using htsbox pileup r345 (https://github.com/lh3/htsbox), requiring a minimum read length of 35 bp (‘-l 35’), mapping quality of 20 (‘-q 20’) and base quality of 30 (‘-Q 30’). If an ancient genome carried an allele not present in the dataset, its genotype was set to missing. Previously generated ancient and historical wolf and dog genomes mapped to the dog reference were obtained from the respective publications3,7,8,13,17,66,72,73 (European Nucleotide Archive study accessions PRJEB7788, PRJEB13070, PRJNA319283, PRJEB22026, PRJNA608847, PRJEB38079, PRJEB39580, PRJEB41490) and genotyped in the same way. A list of the ancient genomes used in analyses and their associated metadata is included in Supplementary Data 2.

Mitochondrial genome phylogenetic analysis and evolutionary dating

We extracted reads mapped to the mitochondrial genome for the ancient wolf samples using samtools (v1.9)74. We called consensus sequences using a 75% threshold, calling any sites with coverage less than 3 as ‘N’, using Geneious (v9.0.5) and removed any samples with greater than 10% missing data. We included a set of previously published mitochondrial genomes from ancient and modern wolves5,9,13,21,75,76,77,78,79,80, which led to a final dataset of 183 individuals (62 14C-dated ancient individuals, 24 undated ancient individuals of which 7 had infinite 14C dates, and 90 modern individuals). We also included three coyote-like sequences as outgroups (from one modern coyote and two ancient wolves with coyote-like mitochondrial sequences: SC19.MCJ015, 14C dated, and SC19.MCJ017, with an infinite 14C date). We aligned all sequences using Clustal Omega (v1.2.4)81. A Bayesian phylogeny was constructed using BEAST (v1.10.1)82, with an HKY + I + G substitution model chosen by JModelTest2 (v2.1.10)83, uncorrelated relaxed log-normal clock and coalescent constant size tree prior. We combined 20 MCMC chains (each run for 200 million iterations), after excluding the first 25% of values as a burn-in. For 14C-dated samples, we included tip date priors that corresponded to a normal distribution with the same mean and 95% confidence distribution as the 14C dates. We estimated the ages of undated samples from a prior distribution as follows: (1) for the n = 24 ancient samples with no 14C information, we used a uniform prior of 0 to 1,000,000 years before the present (bp); (2) for the n = 7 ancient samples with infinite 14C dates, we used a uniform prior as in (1), but with the lower limit as the minimum date given by the radiocarbon dating; (3) all n = 90 modern samples had already been published previously21, and the tip date priors for these samples were the same as the uniform priors used in the earlier study (either 0 to 100 or 0 to 500 bp). The mitochondrial consensus sequences for the wolf samples newly reported here (excluding those that were removed because they had too much missing data) are available as Supplementary Data 4.

f-statistics and admixture graphs

f3– and f4-statistics were calculated with ADMIXTOOLS (v5.0)84, using only transversion sites and with the ‘numchrom: 38’ argument. To overcome memory limitations when calculating large numbers of f4-statistics, block jackknifing was performed external to ADMIXTOOLS across 225 blocks of 10 Mb in size. Admixture graphs were fit using qpGraph, with arguments ‘outpop: NULL’, ‘useallsnps: NO’, ‘blgsize: 0.05’, ‘forcezmode: YES’, ‘lsqmode: NO’, ‘diag: 0.0001’, ‘bigiter: 6’, ‘hires: YES’ and ‘lambdascale: 1’. Outgroup f3-statistics were calculated using only sites ascertained to be heterozygous in the CoyoteCalifornia individual.

PCA was performed on outgroup f3-statistics by transforming the values to distances by taking 1 – f3 and then running the prcomp R function on the resulting distance matrix. Only ancient wolves were included in the calculation of PCs; present-day wolves and ancient and present-day dogs were then individually projected onto the PCs by re-running the analysis once for each of these individuals independently with that single individual added in and saving its coordinates. To avoid overloading the plot with dogs, only the following dogs were included: Basenji, Boxer, BullTerrier, NewGuineaSingingDog, SiberianHusky, Germany.HXH (7,000 bp), Germany.CTC (4.7 ka), Ireland.Newgrange (4,800 bp), Israel.THRZ02 (7,200 bp), Baikal.OL4223 (6,900 bp), Zhokhov.CGG6 (9,500 bp) and PortauChoix.AL3194 (4,000 bp).

PCA was performed on f4-statistics by transforming the values to pairwise distances by taking \(\sqrt2\times (1-r)\), where r is the Pearson correlation for a given pair of individuals, and then running the ppca function from the pcaMethods (v1.74.0) R package on the resulting distance matrix. For the ‘pre-LGM PCA’ (Fig. 4a and Extended Data Fig. 2), only all possible f4-statistics of the form f4(X,A;B,C) were included, where X was the post-25 ka and present-day individuals included in the plot and A, B and C were drawn from a reference set of ancient wolves that lived before 28 ka. For each X, the input was thus a vector of f4-statistics that quantified its relationships to pre-LGM wolves. Only wolves (post-25 ka and present day) were included in the calculation of PCs, and ancient and present-day dogs were then individually projected onto the PCs as described above.

Heterozygosity and F
ST estimates

Conditional heterozygosity was estimated at 1,250,173 transversion sites ascertained to be heterozygous in the CoyoteCalifornia individual, chosen because it is largely an outgroup to wolf diversity. For each individual, exactly two reads were sampled at each of these sites (if available), and the fraction of sites where these two reads displayed different alleles was calculated (alleles other than the two observed in the coyote were ignored). Standard errors were obtained by block jackknifing across the 38 chromosomes.

FST was calculated with smartpca from the EIGENSOFT (v7.2.1) package85, using the ‘inbreed: YES’ option to account for the pseudohaploid genotypes of the ancient genomes (this option was also applied to present-day diploid genomes). FST was calculated pairwise for pools of at least two genomes, formed from individuals selected for being close in time and space (Supplementary Table 1). A few pairs of individuals showed high similarity indicating possible relatedness, as assessed by comparing read mismatch rates across versus within individuals, and one individual from each of these pairs was excluded from these analyses (JK2174 was excluded because of high similarity to JK2183, TU839 because of high similarity to TU840, and CGG17 because of high similarity to Yana-1). FST values for pairs of pools with age midpoints separated by less than 12,500 years were included in the plot.

Divergence time and effective population size analyses with MSMC2

We used MSMC2 (v2.1.2)26 to infer population divergence times and effective population size histories. Input genotypes for this were called using GATK HaplotypeCaller (v3.6)86 on ancient and modern genomes with sequencing coverage >5.8×. For divergence time analyses, haploid X chromosomes from two different male genomes were combined and the point at which the inferred effective population size for this ‘pseudodiploid’ chromosome increased sharply upwards was taken to correspond to a population divergence. Results were scaled using a mutation rate of 0.4 × 10−8 mutations per site per generation13,87 (with a 25% lower rate for X-chromosome analyses) and a mean generational interval of 3 years13. For effective population size inferences, transition variants were ignored and results were scaled using a transversions-only mutation rate inferred from results on modern genomes. For more details on the MSMC2 analyses, see Supplementary Information section 3.

Selection analyses

Selection analysis was performed using PLINK (v1.90b5.2)88. This analysis used the 72 ancient wolf genomes and 68 modern wolf genomes (with the latter including a historical Japanese wolf genome73 treated as ancient for analysis purposes, with its age set to 200 bp). A list of the genomes used for this analysis is available in Supplementary Data 2 (“Used for selection scan” column). All SNPs, not only transversions, were used for this analysis. The age of each wolf was set as the phenotype, with values of 0 for modern wolves, and the ‘–linear’ argument was used to test for an association between SNP genotypes and age, also applying the ‘–adjust’ argument to correct P values using genomic control. The application of genomic control34 here aimed to use the magnitude of temporal allele frequency variance observed across the genome to account for what was observed from genetic drift alone given wolf demographic history. Only results for the following sets of sites were retained and included in the Manhattan plot: sites where at least 40 ancient genomes had a genotype call, sites with a minor allele frequency among the ancient wolves of ≥5% and sites that had at least 7 neighbouring sites within a 50-kb window with a P value that was at least 90% as large (on a log10 scale) as the P value of the site itself. The last ‘neighbourhood filter’ aimed to reduce false positives by requiring similar evidence across multiple nearby sites. As a P-value significance cut-off to correct for the genome-wide testing, we used 5 × 10−8, which is commonly used in genome-wide association studies in humans and also in dogs89. We excluded 15 regions where only a single variant reached significance. A detailed table with the 24 detected regions is available in Supplementary Data 3. To test the robustness of this analysis to false positives arising from genetic drift alone, we applied the same analysis to data from neutral coalescent simulations generated using ms90 and found no false positives. For more details, see Supplementary Information section 4.

Ancestry modelling with qpAdm and qpWave

We used the qpAdm and qpWave methods43 from ADMIXTOOLS (v5.0)84 to test ancestry models for wolf and dog targets postdating 23 ka. For the primary analyses, we used the following set of candidate source populations (age estimate in brackets, years bp): Armenia_Hovk1.HOV4 (ancient dhole), Siberia_UlakhanSular.LOW008 (70,772), Germany_Aufhausener.AH575 (57,233), Siberia_BungeToll.CGG29 (48,210), Germany_HohleFels.JK2183 (32,366), Siberia_BelayaGora.IN18_016 (32,020), Yukon_QuartzCreek.SC19.MCJ010 (29,943), Altai_Razboinichya.AL2744 (28,345), Siberia_BelayaGora.IN18_005 (18,148) and Germany_HohleFels.JK2179 (13,229). We used a rotating approach in which, for each target, we tested all possible one-, two- and three-source models that could be enumerated from the above set. Individuals from the set that were not used as a source in a given model served as thereference set (or the ‘right’ population in the qpAdm framework). This means that, in every model, each of the above individuals was always either in the source list or in the reference list. We ranked models on the basis of their P values, but prioritized models with fewer sources using a P-value threshold of 0.01: if a simpler model (meaning a model with fewer sources) had a P value above this threshold, it ranked above a more complex model (meaning a model with more sources) regardless of the P value of the latter. We also failed models with inferred ancestry proportions larger than 1.1 or smaller than −0.1. For single-source models, qpWave was run instead of qpAdm. Both programs were run with the ‘allsnps: YES’ option (without this option, there was very little power to reject models). We describe ancestry assigned to the ancient dhole source (Armenia_Hovk1.HOV4) as ‘unsampled’ ancestry; note that this does not imply that such ancestry is of non-wolf origin, only that it is not represented by (that is, diverged early from and lacks shared genetic drift with) the ancient wolf genomes in the reference set.

To test whether any post-23 ka or modern wolf genome available might be a good proxy for the western Eurasian wolf-related ancestry identified in Near Eastern and African dogs, we added the 9,500-year-old Zhokhov dog17 to the rotating set of candidate source populations. Chosen for its high coverage, early date and easterly location, this makes the assumption that the Zhokhov dog is a good representative for the eastern dog ancestry component. Using the African Basenji dog as a target, models involving the Zhokhov dog plus another given wolf thus allowed us to test whether that wolf was a good match for the additional component of ancestry. For more details on the qpAdm and qpWave analyses, see Supplementary Information sections 2 (wolf targets) and  5 (dog targets).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

By AKDSEO