Back to Case Studies

From Bottleneck to Breakthrough: How We Empowered Clinical Research, Fueling Large-Scale Cell Data with AWS

From Bottleneck to Breakthrough: How We Empowered Clinical Research, Fueling Large-Scale Cell Data with AWS

Table of Contents

This is some text inside of a div block.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Connect with Us

We'd love to brainstorm your priority tech initiatives and contribute to the best outcomes.

Case Study

From Bottleneck to Breakthrough: How We Empowered Clinical Research, Fueling Large-Scale Cell Data with AWS

From Bottleneck to Breakthrough: How We Empowered Clinical Research, Fueling Large-Scale Cell Data with AWS

Case Study

From Bottleneck to Breakthrough: How We Empowered Clinical Research, Fueling Large-Scale Cell Data with AWS

From Bottleneck to Breakthrough: How We Empowered Clinical Research, Fueling Large-Scale Cell Data with AWS

Connect with Us

We'd love to brainstorm your priority tech initiatives and contribute to the best outcomes.

From Bottleneck to Breakthrough: A Case Study in Research Data Modernization

One-liner summary:
Ideas2IT helped Saint Louis University modernize its research data infrastructure on AWS, cutting processing time by 97%, slashing costs by 60%, and enabling scalable, researcher-friendly access to 450 TB of mobility data.

The Problem with the Status Quo

SLU’s researchers needed to analyze 450 TB of anonymized cell data to study healthcare access, economic mobility, and homelessness. But legacy ingestion pipelines and fragmented storage slowed everything down. Ideas2IT re-architected their platform using AWS EMR, Spark, S3, and EC2 cutting query times from 15 minutes to 30 seconds and reducing monthly cloud costs to $28K. The result? A fast, cost-efficient, and researcher-friendly analytics engine that unlocked real-time, high-impact insights.

Where the Gaps Were

SLU’s research hinged on large-scale analysis of Veraset-provided mobility data—but their platform couldn’t keep up.

  •  ~450 TB of anonymized cell phone data
  •  Fragmented into thousands of micro-files
  •  Queries took 15+ minutes due to S3 and Spark inefficiencies
  •  Inconsistent data formats hindered ingestion and transformation
  •  Infrastructure sprawl led to spiraling AWS costs
  • Researchers were dependent on IT teams to run even simple analyses

Their setup wasn’t built for scale, speed, or ease of access. The infrastructure bottleneck was turning cutting-edge research into a slow, expensive process.

What We Delivered

Ideas2IT rebuilt SLU’s research data platform from the ground up using AWS-native services with performance, cost, and researcher usability as core goals.

  • Amazon EMR for distributed processing with Spark
  • Amazon S3 as the unified data lake with structured, validated formats
  • Amazon EC2 for dynamic, right-sized compute provisioning
  • Apache Spark for parallelized, high-speed transformation

We eliminated the fragmented intermediate S3 layer holding micro-files, redesigned the ingestion logic for batch-friendly structuring, and set up secure, role-based researcher access, removing the dependency on IT teams.

All infrastructure and pipeline tuning was done in close collaboration with SLU’s research computing team and AWS, ensuring alignment with grant requirements, data protection policies, and compliance mandates.

Outcomes We Achieved

Outcome Impact
Processing Time Reduced from 15 minutes to 30 seconds
Monthly Cloud Costs Cut to $28K, a 60% savings
Researcher Access Self-service querying, no cloud training required
Architecture Scalable, performant, and cost-governed
Developer productivity Improved through service isolation and modular code
Scalability Enabled growth from solo clinics to multi-site setups

This modernization transformed SLU’s research capability from slow and brittle to fast, cost-efficient, and future-proof.

Key Lessons

  • Small files, big pain: Removing micro-file fragmentation unlocked massive performance gains
  • Spark + EMR is potent, but needs tuning: Default configs weren’t enough; optimization made the difference
  • UX matters even for researchers: Simplified access led to higher adoption and less IT bottlenecking
  • Performance and cost aren’t a trade-off: With the right setup, you get both
Industry
Education
Location
USA
Tech Stacks
Challenge

The Sinquefield Center for Applied Economic Research (SCAER) at St. Louis University needed a scalable, cost-effective way to manage 450TB of anonymized cell phone data for social research.

Key Takeaways

Big data infrastructure doesn’t have to be big-budget or big-effort. For SLU, aligning cloud-native architecture with actual research workflows was the unlock. We didn’t just modernize their stack, we turned data friction into discovery velocity.

Co-create with Ideas2IT

We show up early, listen hard, and figure out how to move the needle. If that’s the kind of partner you’re looking for, we should talk.
We’ll align on what you're solving for - AI, software, cloud, or legacy systems
You'll get perspective from someone who’s shipped it before
If there’s a fit, we move fast — workshop, pilot, or a real build plan
Trusted partner of the world’s most forward-thinking teams.
AWS partner AICPA SOC ISO 27002 SOC 2 Type ||
Tell us a bit about your business, and we’ll get back to you within the hour.