Ethical approval was given by the Institutional Review Board at Columbia University for both the pilot study and the full study. For the full study, all countries involved had to provide attestations of cultural and linguistic appropriateness for each version of the instrument. Because this was not possible for the pilot study, ethical approval was given only to check the quality, flow and appropriateness of the survey instrument, but not to analyse or report data. For all data, all participants provided informed consent at the start of the survey, and no forms of deception or hidden purpose existed, so all aspects were fully explained.
The materials and methods followed our pre-registered plan (https://osf.io/jfvh4). Substantive deviations from the original plan are highlighted in each corresponding section, alongside the justification for the deviation. All details on the countries included, translation, testing and sampling are included in the Supplementary Information.
Participants
The final dataset was composed of 13,629 responses from 61 countries. The original sample size was 25,877, which was reduced almost by half after we performed pre-registered data exclusions. We removed 6,141 participants (23.7%) who did not pass our attention check (a choice between receiving 10% of monthly income now or paying the same amount in one year). We removed 69 participants for presenting non-sensical responses to open data text (for example, ‘helicopter’ as gender). We removed 13 participants claiming to be over 100 years old. We included additional filters to our original exclusion criteria. Regarding the length of time for responses, individuals faster than three times the absolute deviation below the median time or that took less than 120 seconds to respond were removed. This criterion allowed us to identify 5,870 inappropriate responses. We further removed responses from IP addresses identified as either ‘tests’ or ‘spam’ by the Qualtrics service (264 answers identified). Lastly, we did not consider individuals not completing over 90% of the survey (9,434 responses failed this criterion). Note that these values add up to more than 100% because participants could fail multiple criteria.
For analyses including income, assets and debt, we conducted additional quality checks. We first removed 38 extreme income, debt or assets (values larger than 1 × 108) responses. Next, we removed extreme outliers larger than 100 times the median absolute deviation above the country median for income and 1,000 times larger than the median absolute deviation for national median assets. We further removed anyone that simultaneously claimed no income while also being employed full-time. These quality checks identified 54 problematic responses, which were removed from the data. The final sample and target size are presented in Supplementary Table 2. We provide descriptive information on the full and by-country samples in Supplementary Table 3 and the main variables in Supplementary Table 4.
Instrument
The instrument was designed by evaluating methods used in similar research, particularly those with a multi-country focus8,21,29 or that covered multiple dimensions of intertemporal choice13,28. On the basis of optimal response and participation in two recent studies6,49 of a similar nature, we implemented an approach that could incorporate these features while remaining brief. This design increased the likelihood of reliable and complete responses.
To confirm the viability of our design, we assessed the overall variability of pilot study data from 360 participants from the United States, Australia and Canada. The responses showed that the items elicited reasonable answers, and the three sets of baseline measures yielded responses that would be expected for the three countries. Specifically, it was more popular to choose earlier gains over larger, later ones for the smaller magnitude and closer to 50–50 for the larger magnitude and the payment set. The subsequent choice anomalies also yielded variability within items, which showed some variability between countries. These results confirmed that using baseline choices to set trade-off values in anomaly items was appropriate and would capture relevant differences. We did not analyse these data in full per our Institutional Review Board approval, as we did not want a detailed analysis of subsequent bias decisions. The pilot was completed in April 2021 with participants on the Prolific platform (compensated for participation, not for choices made).
The final version of the instrument required the participants to respond to as few as 10 to as many as 13 anomaly items. All items were binary. During the first three anomaly sets, if a participant chose immediate and then delay (or vice versa), they proceeded to the next anomaly, so only two questions were required. If they decided on immediate–immediate or delay–delay, they would see the third set. After the anomalies, the participants answered ten questions about financial preferences, circumstances and outlook (most of these will be analysed in independent research). Finally, the participants provided age, race/ethnicity/immigration status, gender, education, employment and region of residence. Supplementary Table 1 presents all possible values for each set of items used in the final version of the instrument.
All materials associated with the method are available in the pre-registration repository.
Selection of countries
By design, there was no systematic approach to country inclusion. Through a network of early career researchers worldwide, multiple invitations were sent and posted to collaborate. We explicitly emphasized including countries that are not typically included in behavioural research, and in almost every location, we had at least one local collaborator engaged. All contributors are named authors.
Following data collection, 61 countries were fully included, using 40 languages. All countries also had an English version to include non-native speakers who were uncomfortable responding in the local language. Of the 61 countries, 11 were from Asia, 8 were from the Americas, 5 were from sub-Saharan Africa, 6 were from the Middle East and North Africa, 2 were from Oceania, and 29 were from Europe (19 from the European Union). Several additional countries were attempted but were unable to fulfil certain tasks or were removed for ethical concerns.
Translation of survey items
All instruments went through forward-and-back translation for all languages used. In each case, this required at least one native speaker involved in the process. All versions were also available in English, applying the local currencies and other aspects, such as race and education reporting standards. A third reviewer was brought in if discrepancies existed that could not be solved through simple discussion. Similar research methods were also used for wording. The relevant details where issues arose are included in the Supplementary Information. For cultural and ethical appropriateness, demographic measures varied heavily. For example, in some countries, tribal or religious categories are used as the standard. Other countries, such as the US, have federal guidelines for race and ethnicity, whereas France disallows measures for racial identity. The country-by-country details are posted on the pre-registration page associated with this project.
All data were collected through Qualtrics survey links. For all countries, an initial convenience sampling of five to ten participants was required to ensure that comprehension, instrument flow and data capture were functional. Minor issues were corrected before proceeding to ‘open’ collection. Countries aimed to recruit approximately 30 participants before pausing to ensure functionality and that all questions were visible. We also checked that currency values had been appropriately set by inspecting responses’ variability (that is, if options were poorly selected, this would be visible in having all participants make the same choices across items). Minimal issues arose and are outlined in the Supplementary Information.
For data circulation, all collaborators were allowed a small number of convenience participants. This decision limited bias while ensuring the readiness of measures and instruments, as multiple collaborators in each country used different networks, thereby reducing bias. Once assurances were in place, we implemented what we refer to as the Demić–Većkalov method, which two prior collaborators in recent studies developed. This method involves finding news articles online (on social media, popular forums, news websites, discussion threads, sports team supporter discussion groups/pages and so on) and posting in active discussions, encouraging anyone interested in the subject to participate. Circulation included direct contact with local organizations (non-governmental organizations and non-profits, often with thematic interests in financial literacy, microcredit and so on) to circulate with stakeholders and staff, email circulars, generic social media posts, informal snowballing and paid samples (in Japan only; no other participants were compensated). We note that this approach to data collection with a generally loose structure was intentional to avoid producing a common bias across countries. Similar to recent, successful multi-country trials30,55, this generates more heterogeneous backgrounds, though it still skews toward populations with direct internet access (that is, younger, higher education and somewhat higher income).
As described in the pre-registration (https://osf.io/jfvh4), the minimum sample threshold to achieve a power of 0.95 for the models presented was 30 participants per country. However, to produce a more robust sample, we used three tiers for sample targets: population ≤ 10 million, 120 participants; 10 million ≤ population ≤ 100 million, 240 participants; and population > 100 million, 360 participants.
Comprehensive details about methods, guidelines, measurement building and instruments are available in the Supplementary Information and on the pre-registration site.
Procedure
For the full study, all participants began by choosing from two gains of approximately 10% of the national household income average (either median or mean, depending on the local standard) immediately, or 110% of that value in 12 months. For US participants, this translated into US$500 immediately or US$550 in one year. Participants who chose the immediate option were shown the same option set, but the delayed value was now 120% (US$600). If they preferred the immediate prospect, a final option offered 150% (US$750) as the delayed reward. If participants chose the delayed option initially, subsequent choices were 102% (US$510) and 101% (US$505). This progression was then inverted for losses, with the identical values presented as payments, increasing for choosing delayed and decreasing for choosing immediately. Finally, the original gain set was repeated using 100% of the monthly income to represent higher-magnitude choices.
Following the baseline scenarios, the anomaly scenarios incorporated the simplified indifference point, the largest value at which the participants chose the delayed option in the baseline items. For example, if an individual chose US$500 immediately over US$550 in 12 months, but US$600 in 12 months over US$500 immediately, then US$600 was the indifference value for subsequent scenarios. Those choices were then between US$500 in 12 months versus US$600 in 24 months (present bias), US$500 immediately versus US$700 in 24 months (subadditivity) and either being willing to wait 12 months for an additional US$100 in one set or being willing to lose US$100 to receive a reward now rather than in 12 months (delay–speedup). For consistency, the values were initially derived from local average income (local currency) and then from constant proportions based on the initial values (Supplementary Information). This approach was chosen over directly converting fixed amounts in each country due to the substantial differences in currencies and income standards.
Participants answered four additional questions related to the choice anomalies (gain–loss and magnitude effects were already collected in the first three sets). Due to contingencies in the instrument, all participants were then shown a present bias scenario (choice between 12 months and 24 months) followed by a subadditivity scenario (choice between immediate and 24 months). They were then randomly presented one of two delay–speedup scenarios (one framed as a bonus to wait, the other stated as a reduction to receive the gain earlier). After two similar but general choice and risk measures, they were presented with the second delay–speedup scenario. Due to the similarity in their wording, these scenarios were anticipated to have the lowest rates of anomalous choice. Finally, participants answered ten questions on financial circumstances, (simplified) risk preference, outlook and demographics. Participants could choose between the local official language (or languages) and English. By completion, 61 countries (representing approximately 76% of the world population) had participated.
We assessed temporal choice patterns in three ways. First, we tested discounting patterns from three baseline scenarios to determine preference for immediate or delayed choices for gains (at two magnitudes) and losses (one). Second, we analysed the prevalence of all choice anomalies using three additional items. Finally, with this information, we computed a discounting score based on responses to all choice items and anomalies, which ranged from 0 (always prefer delayed gains or earlier losses) to 19 (always prefer immediate gains or delayed losses).
Deviations from the pre-registered method
There were minor deviations from the pre-registered method in terms of procedure. First, we did include an attention check, and the statement that we would not should have been removed; this was an error. Second, we had initially not planned to include students in the main analyses. Still, our recruitment processes turned out to be generally appropriate in terms of engaging students (16%) and non-students (84%) in the sample. We are therefore not concerned about skew and instead consider this a critical population. The impact of these deviations in the analyses is explained in the Supplementary Information.
Statistical analysis
Hierarchical generalized additive models36 were estimated using fast restricted maximum likelihood and penalized cubic splines56. We selected the shrinkage version of cubic splines to avoid overfitting and foster the selection of only the most relevant nonlinear smooths57. Robustness checks were performed for the selection of knots (Supplementary Fig. 10) and spline basis (Supplementary Table 7), leaving the results unchanged. In these models, we estimated all effects of continuous variables as smooths to identify potential nonlinear variables, plus country of residence as random effects.
Relevant nonlinear effects were incorporated into our main linear and generalized mixed models. These models were fitted using a restricted maximum likelihood. Model convergence and assumptions were visually inspected. Bayesian versions of these models were estimated using four chains with 500 warmups and 1,000 iteration samples (4,000 total samples). We confirmed that all parameters presented \(\hat R\) values equal to or below 1.01 and tail effective sample sizes above 1,000. We set the average proposal acceptance probability (delta) to 0.90 and the maximum tree depth to 15 (ref. 58) to avoid divergent transitions. We employed a set of weakly informative priors, including t distributions with three degrees of freedom and a standard deviation of 10 for model intercept and random effect standard deviations, a normal distribution with a zero mean, and a standard deviation of three for the fixed effect regression coefficient. For the standard deviation of the smooth parameter, we employed an exponential distribution with a rate parameter of one59.
For smooth terms, we analysed whether each term was significant for the generalized additive model and presented substantial variance in the final models. We explored 95% confidence/credibility intervals for fixed effects58 and examined support for potential null effects. All reported tests were two-tailed. Our power estimation considered unstandardized fixed regression effects of |0.15| and |0.07| as ultra-low effect sizes (categorical and continuous variables). Thus, assuming a null effect of a similar or lower magnitude (|0.10|), we computed log Bayes factors to quantify evidence favouring null effects of this range60. To understand the sensitivity of our results, we explored support for narrower null effects (ranges of |0.05| and |0.01|). As Bayes factors depend on prior specification, we also estimated the percentage of posterior samples within these regions (which could be understood as a region of practically equivalence analysis61). Both statistics provide sensitive, complementary evidence of whether null effects were supported or not60,61. Unfortunately, such analyses could not be conducted for smooth effects, as no single parameter could resume the relationship between the predictor and the dependent variable.
The analyses were conducted in R v.4.0.2 (ref. 62) using the Microsoft R Open distribution63. The meta-analyses were conducted using the meta package. Nonlinear effects were studied using the mgcv64 package, with the main models being estimated using the gamm4 (ref. 65) and the brms58 packages for frequentist and Bayesian estimation, respectively. All graphs were created using the ggplot2 (ref. 66) (v.3.3.3) package. Data manipulations were conducted using the tidyverse67 family of packages (v.1.3.0).
Deviation from the pre-registered plan
We aimed to follow our pre-registration analyses as closely as possible. On certain occasions, we decided to amplify the scope of the analyses and present robustness checks for the results presented by employing alternative estimation and inference techniques.
There was only one substantive deviation from our pre-registered analyses aside from the delay–speedup calculation. In the original plan, we intended to explore the role of financial status. In our final analysis, we employed individual assets and debts to this end. Assets and debts were included as raw indicators instead of inequality measures because we did not find reliable national average assets or individual debt sources.
One minor adaptation from our pre-registration involved our plan to test for nonlinear effects and use Bayesian estimation only as part of our exploratory analyses. However, as we identified several relevant nonlinear effects, we modified our workflow to accommodate those as follows: (1) we initially explored nonlinear effects using hierarchical generalized additive (mixed) models, (2) we included relevant nonlinear effects in our main pre-registered models and (3) we estimated Bayesian versions of these same models to test whether null effects could be supported in certain cases.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.