AWS Public Sector Blog

Frugal architecture in action: The Urban Institute innovates with R and Serverless on AWS

AWS branded background design with text overlay that says "Frugal architecture in action: The Urban Institute innovates with R and Serverless on AWS"

“In architecture, every decision comes with a trade-off. Cost, resilience, and performance are non-functional requirements that are often at tension with each other.” — Werner Vogels, The Frugal Architect, Law III

Nonprofit organizations are typically frugal and responsible. They strive to improve the human condition in innumerable ways, yet they cannot raise capital like a commercial organization, so they have to make the most of the resources they have. They often rely on donor and grant funding, so they have to justify every expense to their constituents. Going over budget is not an option. They apply that frugal approach to IT: they build and operate only what they need to pursue their mission, and constantly innovate both to meet mission objectives and optimize cost. To expand their reach, they often collaborate with academia and other nonprofits, and they engage early-career talent.

Even with these constraints, nonprofits aspire to solve some of the world’s biggest problems, and often, they use innovative IT architectures on Amazon Web Services (AWS) to do it.

Frugal architecture at the Urban Institute

The Urban Institute has been producing rigorous data and evidence to support policy solutions since President Lyndon Johnson founded the organization in 1968. Through their long history, Urban has adopted traditional and innovative architectures.

“Systems that Last Align Cost to Business.” — The Frugal Architect, Law II

In the early days, they coded in Fortran, which was the computer programming language of choice for science at the time. They still occasionally update their Fortran development practices. In a 2018 Medium post, Jessica Kelly, Urban’s senior director of research and web technology wrote, “Organizations like Urban often have many complex models in older programming languages that would be time-consuming to rewrite in a more ‘modern’ language (e.g., Python). These models are frequently referred to as legacy systems. When these systems are stable, well-documented, and still actively developed, there is no reason to read the word ‘legacy’ as a pejorative.”

Adopting the R programming language

Urban also adopts new tools and methods when appropriate. Over the decades, Urban has used SAS and Stata and eventually adopted R. In another Medium post from 2019, Graham MacDonald, Urban CIO, wrote, “Researchers should always choose the right programming language for the right job. R is now the best programming language for innovation at Urban.” He gave a number of reasons, including capabilities for data visualization and geospatial analysis, big data and big processing efficiency, and accessing and analyzing nontraditional data.

Adopting R has reduced training time and cost for both experienced and early-career staff members. Urban has found learning R to be a relatively smooth transition for staff across the organization. They have also observed that university programs that focus on public policy and statistics often teach R programming, so using R helps new graduates get up to speed quickly.

Urban has also invested in programs that foster proficiency and productivity with R. They built an R community, offering R Lunch Labs, created an R Users Group that provides on-call support so researchers can get same-day help with R, and incentivized junior staff to modernize legacy code with R. They also built tools and infrastructure to accelerate adoption, such as packages to produce visualizations in R that adhere to Urban’s style guide, a user-friendly interface to let staff run computationally intensive R code on powerful Amazon Elastic Compute Cloud (Amazon EC2) instances, and a custom R runtime for AWS Lambda so they can use R for serverless compute in the AWS Cloud.

Figure 1. The Education Data Portal summary endpoint functionality lets users compute aggregated statistics against millions of rows within seconds using different programming languages. This example shows how to generate Stata and R syntax to summarize K-12 student enrollment data.

“Unchallenged Success Leads to Assumptions.” — The Frugal Architect, Law VII

Frugality does not imply rigidity. Graham wrote, “R is not the best language for everything. We support projects in all languages and choose the right language for the job. Often that’s SAS, Python, Stata or Mata, or something else, like Fortran, and we have no problem with that. But I have found that R is both the easiest language to learn and the quickest and most effective path to getting researchers to adopt new tools.”

Results

With help from the AWS IMAGINE Grant program, they built a proof of concept that extends the functionality of the Urban Institute’s Education Data Portal (EDP) API. The EDP averages more than 3,000 unique users and 12,000-plus requests monthly.

Urban also built a Spatial Equity Data Tool (SEDT) to measure resource disparities at the national, state, county, and city level. Rather than assuming that R is the best tool for every job, they build the SEDT in Python, and used R to process the data they use in the SEDT. For programmers, Urban also built a SEDT API and R package. The SEDT has processed more than 2,600 requests since Urban launched it in September 2020.

Figure 2: The SEDT lets users see how different demographic groups are over- and under-represented in a dataset. This example displays demographic disparity scores for a dataset of electric vehicle charging stations from the U.S. Department of Energy.

“Cost Optimization is Incremental.” — The Frugal Architect, Law VI

Urban has been adopting Serverless computing to optimize cost incrementally. In a post highlighting how Urban integrates R and Lambda across projects, Erika Tyagi, lead data engineer at Urban, explains, “AWS Lambda enables Urban’s developers to quickly build and deploy scalable applications in a highly cost-effective manner… and is essential to several tools that Urban has developed.” In addition to the EDP and SEDT, Urban uses Lambda to implement a number of capabilities, including automating data quality checks, running microsimulation models, and expanding access to confidential data. As a team of data scientists note, Urban also uses AWS Step Functions to manage increasingly complex serverless applications. To simplify the user experience for the SEDT tool, Urban used AWS Step Functions to orchestrate and coordinate workflows.

Conclusion

With tools like EDP and SEDT, Urban supports a diverse set of builders and users, including researchers, government agencies, policymakers, and community advocates. They encourage and support novel applications for public policy research. By adopting a frugal discipline, they continue to innovate sustainably. You can follow their Data@Urban feed on Medium and their Urban Wire blog.

For more information about frugal architecture, see AWS CTO Werner Vogels’s website, The Frugal Architect, his witty YouTube video short, and his deep dive in the 2023 re:Invent Keynote (or the keynote recap). If you haven’t tried AWS Lambda and AWS Step Functions, this AWS tutorial is a great place to start.