Precision Medicine for Everyone: All of Us Research Program Initiative

In This Article
-
The future of medicine lies not in averages, but in understanding each person’s unique biology.
-
All of Us is not just a study—it's a movement toward equitable, individualized care.
-
By embracing diversity in data, we unlock the potential to treat disease more effectively for everyone.
In 2024, I attended a conference on Alzheimer’s disease that brought together over 8,000 participants to share the latest advancements. One of the key topics in the conference was the discovery of biomarkers—medical measurements, such as blood test results, that aid in diagnosing diseases—for the early detection of Alzheimer’s disease. Several speakers emphasized tau, a protein that accumulates abnormally in the brains of individuals with Alzheimer’s disease, as a significant biomarker for predicting Alzheimer’s disease in its early stages, presenting convincing datasets from multiple clinical studies to support their findings. However, later in the conference, one scientist presented a study suggesting that this biomarker worked effectively only for White populations, but not for Hispanic or African American groups. This finding was striking, underscoring a longstanding challenge in medicine: the inequities in health outcomes and disease risk among different racial and social groups—commonly referred to as health disparities.
Although humans share about 99% of the same DNA, the remaining 1% accounts for a remarkable diversity in our traits and characteristics. Environmental factors—such as lifestyle, nutrition, and geographic location—also shape who we are. From a faith perspective, this diversity is not a flaw but a reflection of God’s wisdom in creation. Human variation, whether genetic, cultural, or environmental, is meant to be a source of mutual learning and enrichment, fostering understanding, respect, and collaboration rather than division.
Diversity is also evident in how susceptible people are to different diseases, which can stem from genetic factors, environmental influences, or a combination of both. For instance, Huntington’s disease is a genetic disorder caused by a mutation in the HTT gene, and sickle cell anemia arises from specific mutations in the HBB gene. On the other hand, Alzheimer’s disease risk is associated with a combination of genetics (e.g., APOE4 allele) as well as external factors such as exercise, diet, and education. As a result, certain populations may face a higher risk for particular diseases.
Socioeconomic status (SES) is another important determinant of disease risk. Differences in SES could exacerbate health disparities. One way to assess this is through the “deprivation index,” which summarizes the overall quality of life in a region. Individuals living in areas with a high deprivation index often face reduced access to education, healthcare, clean water, and clean air, all of which negatively impact health outcomes. Moreover, external factors linked to SES can increase disease-related mortality. For instance, in certain racial and ethnic groups, socioeconomic barriers have historically limited access to routine screenings and checkups for breast cancer. As a result, diagnosis often occurs at later stages of the disease, when treatment options may be less effective. Even when effective treatments are available, low income may put them out of reach for those living in high-deprivation areas.
Historically, many datasets used in biomedical research have been predominantly composed of White participants, for several reasons. Socioeconomic disparities can limit access to healthcare for certain groups, resulting in fewer opportunities to collect comprehensive data on diseases affecting these populations. This lack of representation can lead to incomplete or biased conclusions about diseases, creating a vicious circle exacerbating inequity in healthcare outcomes and biomedical research participation.
Stigma
Another factor contributing to the underrepresentation of certain groups in health data is the legacy of stigmatizing research practices. Historically, individuals from certain groups have been subjected to unethical research studies. For instance, at the Ohio State Penitentiary in the 1950s and 1960s, Dr. Chester M. Southam, a prominent oncologist, injected inmates with live cancer cells to study how the human immune system would respond [1]. The inmates, often enticed by promises of reduced sentences, were not informed about the nature of the injections or the risks involved. Southam also replicated his experiments on terminally ill patients at the Jewish Chronic Disease Hospital in New York, again without proper informed consent. Around the same time, the infamous Tuskegee Syphilis Study, conducted by the U.S. Public Health Service from 1932 to 1972, deliberately misled 600 African American men in Alabama into believing they were receiving treatment for "bad blood" [2]. Instead, researchers observed the devastating progression of untreated syphilis, even after penicillin became a cure. These violations of trust and ethics have created long-lasting apprehension toward participating in research studies, further limiting representation in health data.
Polygenic risk score
Without sufficient representation from diverse groups, research findings risk being less applicable to the broader population. For example, methods used to calculate disease risk—like statistical models or computational techniques—can be biased if they are based on data from predominantly White populations. One such method is the polygenic risk score, which combines the effects of multiple genes to calculate an individual’s risk for a specific disease. While this technique holds great promise, the problem lies in the list of genes used for these calculations. These lists are often derived from studies involving European White populations, meaning the risk scores may not apply accurately to other groups around the world.
Machine learning models
Similarly, machine learning models are increasingly used to predict disease risk by analyzing large datasets. However, if a machine learning model is trained on data primarily from the majority population, it will struggle to make accurate predictions for underrepresented populations. As a result, research findings tend to disproportionately benefit majority populations, further increasing health disparities and leaving vulnerable groups at a disadvantage. Therefore, broadening representation in research is critical to creating equitable healthcare solutions that work for everyone.
All of Us (AoU)
To tackle these challenges, the National Institutes of Health (NIH) launched the All of Us (AoU) Research Program in 2018 [3]. AoU is an ambitious initiative designed to collect biomedical and lifestyle-related data from one million or more people across the United States. Its goal is to build a diverse dataset by including individuals from all walks of life, with a particular focus on increasing representation among groups historically underrepresented in biomedical research. The program collects a wide range of data, including clinical information, lifestyle habits, data from wearable devices, laboratory measurements, and whole genome sequencing.
As of January 2025, nearly 850,000 individuals have joined the study, with about 45% representing racial and ethnic minorities and 80% belonging to groups historically underrepresented in biomedical research. The genomes of roughly 250,000 participants have been sequenced and shared with both the individuals themselves and the researchers who obtained permission to access AoU data.
The rich, diverse data in the All of Us (AoU) program enables a wide range of research opportunities. With its large participant base and extensive range of collected information, researchers can develop more equitable machine learning models to predict disease risk. They can also create cohorts based on socioeconomic status (SES) to examine disease prevalence and assess how SES impacts health outcomes. In addition, integrating whole-genome sequencing data with other clinical information may help identify novel genetic variants linked to disease.
There are several aspects that make AoU different than other existing biobanks. For instance, AoU participants come from diverse backgrounds, whereas many traditional biobanks such as UK Biobank and Estonian Biobank have participants from primarily European decent. The participants of the AoU study are regarded as active participants, thereby they have full access to their datasets. In addition, AoU follows strong privacy principles. For instance, the data is stored on a cloud platform and researchers who have access to the data are not allowed to download these datasets. This cloud platform also enables democratizing research opportunities for researchers including professional scientists and citizen scientists.
The AoU Research Program has implemented several measures to safeguard participant privacy and ensure ethical use of its data, minimizing the risk of unintended harm. Researchers seeking access to the AoU datasets must first complete a specialized ethical training, similar to the training required for handling human subject data. This training includes an overview of historical research misconduct, such as instances where studies stigmatized underrepresented groups. By educating researchers on these past missteps, AoU fosters greater awareness and encourages ethical behavior to avoid repeating these mistakes. To promote transparency, AoU publicly shares brief descriptions of each research project. If any project violates ethical standards, it is removed from the program, and the names of the researchers responsible are made public. These measures act as a deterrent, encouraging caution and accountability in the research process. Additionally, AoU has established a dedicated board to evaluate projects that might carry the potential for stigma—for instance, studies that aim to link a disease to a specific racial or minority group. This board thoroughly reviews the goals of such projects to ensure they align with ethical principles, further safeguarding against harmful outcomes.
One million participants
The AoU program will soon surpass its milestone of collecting data from over one million participants. Despite these efforts, several limitations and challenges persist. There is an urgent need for scientists to develop effective and scalable methods to harness these extensive datasets. One significant challenge lies in addressing missing data across various modalities. Developing computational methods to impute the missing data or account for incomplete data and more data collection efforts could address this challenge. Additionally, while a vast array of data types has already been generated, many intricate factors potentially related to disease or serving as risk factors remain unexplored. As such, the findings from AoU are likely to represent only a partial understanding of complex health phenomena. Finally, the program will need future financial support to continue collecting data from its participants for a longer period.
Human beings, created with remarkable complexity and diversity, present unique challenges that must be addressed when tackling problems such as disease prevention and treatment. In this regard, the AoU Research Program represents a pivotal effort to advance our understanding of human health. Its research outcomes and translational impact have the potential to improve healthcare accessibility and quality for individuals from all walks of life, reduce healthcare costs, and prevent disease before their onset. Furthermore, the program’s findings could inspire healthier lifestyles, emphasizing exercise and better diets. By highlighting key social determinants of health, AoU could also empower policymakers to create opportunities that address disparities and improve the lives of underprivileged communities.
References
- Skloot, Rebecca (2010). The Immortal Life of Henrietta Lacks. New York: Crown/Archetype. pp. 127–135. ISBN 9780307589385.
- Brandt, Allan M. (December 1978). "Racism and Research: The Case of the Tuskegee Syphilis Study" (PDF). The Hastings Center Report. 8 (6). Garrison, New York: Wiley-Blackwell: 21–29. doi:10.2307/3561468
- https://allofus.nih.gov/