How a $100 SMS Billing Bug Became an Open-Source Unicode Developer Toolkit Used Across Three Countries

Invisible Unicode characters were inflating SMS billing with no tooling to detect or stop them. Ideas2IT built a two-layer sanitisation engine to fix the immediate problem, then productised it into an open-source developer toolkit covering 30 or more Unicode confusables for any team dealing with encoding edge cases.

Industry

All Industries

Service

App Development

Release

Open-source Chrome extension

Adoption

40+ users across US, India, Japan

01 Challenge

A cloud communication platform was billing customers $100 more per month than volume justified. Non-GSM Unicode characters in outbound messages triggered Unicode mode, cutting SMS segment capacity from 160 to 70 characters. No tooling existed to detect, visualise, or stop the characters before they reached the billing layer.

02 Solution

Ideas2IT built a two-layer sanitisation engine: a frontend Chrome extension flagging non-GSM characters in real time with visual indicators, and a backend Java sanitiser injected into the SMS pipeline before ingestion. Recognising the problem was not platform-specific, the team expanded the extension into a full-featured Unicode toolkit covering detection, conversion, and cross-format translation for any engineering workflow.

03 Outcome

SMS billing inflation eliminated at $100 per customer monthly. The Unicode Toolkit Chrome extension detected 30 or more confusables, shipped with zero setup friction, and reached 40 or more users across the US, India, and Japan within two weeks of open-source release.

Phase 01

Tracing an invisible billing bug to its character-level source

Root cause diagnosis and two-layer sanitisation patch: fixing billing inflation without touching the message layer

The billing overage had no surface-level explanation. Volume was stable, usage patterns were normal, but segment counts were inflating consistently.

Ideas2IT traced the cause to non-GSM Unicode characters, specifically smart quotes, em dashes, and non-breaking spaces entering messages through copy-paste from emails, documents, and chat tools. These characters triggered the platform's Unicode billing mode, reducing segment capacity from 160 to 70 characters per message and multiplying send costs without any visible change to the message. 

The fix required two layers operating simultaneously: a frontend Chrome extension surfacing the offending characters to users before they sent, and a backend Java sanitiser auto-converting non-GSM characters inside the ingestion pipeline before they reached billing. The pipeline patch alone eliminated the $100 monthly overage per customer.

DELIVERABLES

  • Frontend Chrome extension — Real-time detection of non-GSM Unicode characters with visual indicators and suggested fixes
  • Backend Java sanitisation layer — Auto-conversion of non-GSM characters injected into the SMS ingestion pipeline before billing
  • Visual indicator and fix-suggestion UI — Character-level flagging surfaced to message authors at the point of composition
  • Non-GSM character classification map — Enumeration of character classes triggering Unicode billing mode across the platform

Phase 02

Turning a pipeline fix into a reusable tool for any encoding problem

Unicode Toolkit: from a single-platform patch to an open-source developer tool adopted across three countries

The billing fix worked. The harder recognition was that the encoding problem was not specific to the SMS platform. The same class of invisible characters caused failing tests, broken string comparisons, and data integrity issues in any system where text moved between tools, and no publicly available tooling addressed it. 

Ideas2IT expanded the Chrome extension into the Unicode Toolkit: a full-featured developer utility covering visual detection of 30 or more confusables including zero-width spaces, smart quotes, and em dashes, alongside a text conversion engine handling Unicode to ASCII, characters to decimal, hex, binary, and Unicode code points, and string to code point translation in both directions. 

The toolkit shipped with context menu integration, one-click clipboard copy, and a fully offline architecture with no tracking. It was released publicly as a free Chrome extension and reached 40 or more users across the US, India, and Japan within two weeks.

DELIVERABLES

  • Unicode Toolkit Chrome extension (public release) — Open-source developer utility for detection and conversion of Unicode encoding edge cases
  • Visual detection layer — Highlights 30+ Unicode confusables including zero-width spaces, smart quotes, and non-breaking spaces
  • Unicode to ASCII text converter — Bidirectional conversion between Unicode and ASCII character representations
  • Decimal, hex, and binary converter — Character-to-encoding conversion across decimal, hexadecimal, and binary formats
  • Code point translation engine — Bidirectional string-to-code-point and code-point-to-readable-text conversion
  • Context menu integration — Right-click workflow access for instant character detection and conversion
  • Offline-first architecture — Zero backend dependency, no login, no tracking, installs and runs immediately

The Outcome

From a billing investigation to a developer tool with international adoption

Category Metric Description
Billing impact ~$100/mo
per client
Eliminated segment inflation caused by non-GSM Unicode character ingestion
Detection coverage 30+ Including zero-width spaces, smart quotes, em dashes, and non-breaking spaces
Adoption 40+ Organic adoption across US, India, and Japan
Geographies 3 US, India, Japan
Setup friction Zero Offline, no login, no backend, no tracking
Release Open source Free Chrome extension available publicly
The billing fix required a single pipeline injection. The toolkit required a different decision: treat the character encoding problem as a class of bug, not a one-time incident, and build something that addressed it permanently. Visual detection, cross-format conversion, and offline-first distribution let any developer, QA engineer, or data specialist resolve encoding edge cases at the point they appear rather than after they reach production. The two-week adoption across three countries was the result of that architectural discipline, not of the promotional effort behind the release.