Beyond JSON: Achieving Sub-millisecond Latency with Go, NATS, and Protobuf
In the era of instant messaging and real-time financial transactions, “fast” is no longer a feature—it’s a baseline requirement. For years, the industry default has been RESTful APIs over HTTP communicating via JSON. While excellent for compatibility and ease of debugging, this stack creates significant friction when milliseconds matter.
When building systems for secure telecommunications or high-frequency fintech rails, I’ve found that the overhead of text-based protocol parsing and connection handshakes becomes the primary bottleneck. To break the sub-millisecond barrier, we need to look beyond JSON.
The Mathematical Perspective on Efficiency
Let’s start with the numbers. Consider a simple user profile update flowing through your system:
JSON Payload (183 bytes):
{
"user_id": "550e8400-e29b-41d4-a716-446655440000",
"username": "john_doe",
"email": "john@example.com",
"age": 32,
"created_at": 1704398400,
"is_active": true
}
At 1 million messages per second, that’s 183 MB/s of bandwidth—before TCP/IP overhead. The real cost, however, isn’t the bytes on the wire. It’s the CPU cycles spent parsing text, allocating strings, and handling schema validation at runtime.
JSON parsing in Go typically takes 2-5 microseconds per message. Multiply that by a million, and you’re burning 2-5 full CPU cores just on deserialization. This is the hidden tax of human-readable protocols.
Protocol Buffers: The Binary Advantage
Protocol Buffers (Protobuf) takes a fundamentally different approach. Instead of parsing text, it works with binary data that maps directly to memory structures. The schema is defined once, compiled ahead of time, and the runtime cost is minimal.
Here’s the same user profile in Protobuf schema format:
syntax = "proto3";
package user;
option go_package = "github.com/yourorg/yourapp/proto/user";
message UserProfile {
string user_id = 1;
string username = 2;
string email = 3;
int32 age = 4;
int64 created_at = 5;
bool is_active = 6;
}
After compilation with protoc --go_out=. user.proto, you get strongly-typed Go structs with optimized serialization methods. The binary representation is approximately 85 bytes—less than half the JSON size—and serialization takes 100-300 nanoseconds instead of microseconds.
That’s a 10-20x performance improvement on serialization alone.
Why NATS Completes the Picture
HTTP is a request-response protocol built on TCP, which means every interaction involves:
- TCP handshake (3-way, ~1-2ms on typical networks)
- TLS negotiation (another 1-2ms for HTTPS)
- HTTP header parsing
- Connection pooling complexity
NATS, by contrast, is a lightweight pub-sub messaging system that maintains persistent connections. Once connected, message delivery is pure payload routing with microsecond-level latency. It’s purpose-built for exactly this use case: high-throughput, low-latency message passing between services.
Key advantages:
- Persistent connections: No handshake tax per message
- Simple text protocol: Minimal parsing overhead
- At-most-once delivery by default: No ack overhead for fire-and-forget scenarios
- Built-in patterns: Pub-sub, request-reply, and queue groups
- JetStream: Optional persistence layer when needed
Building a Sub-millisecond Service
Let’s build a practical example: a user profile update service that consistently processes messages in under 1ms.
Step 1: Generate Protobuf Code
Save the schema above as user.proto and generate Go code:
protoc --go_out=. --go_opt=paths=source_relative user.proto
Step 2: Publisher Service
package main
import (
"log"
"time"
"github.com/nats-io/nats.go"
"google.golang.org/protobuf/proto"
pb "github.com/yourorg/yourapp/proto/user"
)
type ProfilePublisher struct {
nc *nats.Conn
}
func NewPublisher(url string) (*ProfilePublisher, error) {
nc, err := nats.Connect(url,
nats.Timeout(10*time.Second),
nats.ReconnectWait(1*time.Second),
)
if err != nil {
return nil, err
}
return &ProfilePublisher{nc: nc}, nil
}
func (p *ProfilePublisher) PublishUpdate(profile *pb.UserProfile) error {
data, err := proto.Marshal(profile)
if err != nil {
return err
}
return p.nc.Publish("user.profile.update", data)
}
func main() {
pub, err := NewPublisher(nats.DefaultURL)
if err != nil {
log.Fatal(err)
}
defer pub.nc.Close()
profile := &pb.UserProfile{
UserId: "550e8400-e29b-41d4-a716-446655440000",
Username: "john_doe",
Email: "john@example.com",
Age: 32,
CreatedAt: time.Now().Unix(),
IsActive: true,
}
start := time.Now()
if err := pub.PublishUpdate(profile); err != nil {
log.Fatal(err)
}
log.Printf("Published in %v", time.Since(start))
}
Step 3: Subscriber Service
package main
import (
"log"
"sync/atomic"
"time"
"github.com/nats-io/nats.go"
"google.golang.org/protobuf/proto"
pb "github.com/yourorg/yourapp/proto/user"
)
type ProfileSubscriber struct {
nc *nats.Conn
msgCount uint64
}
func NewSubscriber(url string) (*ProfileSubscriber, error) {
nc, err := nats.Connect(url)
if err != nil {
return nil, err
}
return &ProfileSubscriber{nc: nc}, nil
}
func (s *ProfileSubscriber) handleUpdate(m *nats.Msg) {
start := time.Now()
profile := &pb.UserProfile{}
if err := proto.Unmarshal(m.Data, profile); err != nil {
log.Printf("Unmarshal error: %v", err)
return
}
// Simulate processing
s.processProfile(profile)
latency := time.Since(start)
atomic.AddUint64(&s.msgCount, 1)
if latency > 1*time.Millisecond {
log.Printf("⚠️ Slow processing: %v for user %s", latency, profile.Username)
}
}
func (s *ProfileSubscriber) processProfile(profile *pb.UserProfile) {
// Your business logic here
// Example: update cache, trigger workflows, etc.
}
func (s *ProfileSubscriber) Start() error {
_, err := s.nc.Subscribe("user.profile.update", s.handleUpdate)
return err
}
func main() {
sub, err := NewSubscriber(nats.DefaultURL)
if err != nil {
log.Fatal(err)
}
defer sub.nc.Close()
if err := sub.Start(); err != nil {
log.Fatal(err)
}
// Report stats every second
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()
for range ticker.C {
count := atomic.LoadUint64(&sub.msgCount)
log.Printf("Processed %d messages", count)
atomic.StoreUint64(&sub.msgCount, 0)
}
}
Real-World Performance Numbers
In production environments, this stack consistently delivers:
| Metric | JSON/HTTP | Protobuf/NATS | Improvement |
|---|---|---|---|
| Serialization | 2-5 μs | 100-300 ns | 10-20x |
| Deserialization | 3-7 μs | 150-400 ns | 10-20x |
| Message Size | 183 bytes | 85 bytes | 2.2x smaller |
| End-to-end Latency | 10-50 ms | 200-800 μs | 20-60x |
These numbers are from a test cluster with NATS on the same network segment. Your mileage will vary based on network topology, but the relative improvements hold.
Optimizations for the Last Mile
Getting from “fast” to “ridiculously fast” requires attention to detail:
1. Connection Reuse and Pooling
Never create connections per message. NATS connections are thread-safe and designed for reuse:
// Bad: Creates connection overhead
func publishMessage(data []byte) error {
nc, _ := nats.Connect(nats.DefaultURL)
defer nc.Close()
return nc.Publish("subject", data)
}
// Good: Reuse connection
type Publisher struct {
nc *nats.Conn
}
func (p *Publisher) Publish(subject string, data []byte) error {
return p.nc.Publish(subject, data)
}
2. Buffer Pooling
Reduce GC pressure by reusing byte buffers:
import "sync"
var bufferPool = sync.Pool{
New: func() interface{} {
b := make([]byte, 0, 1024)
return &b
},
}
func serializeProfile(profile *pb.UserProfile) ([]byte, error) {
bufPtr := bufferPool.Get().(*[]byte)
buf := *bufPtr
defer func() {
*bufPtr = buf[:0]
bufferPool.Put(bufPtr)
}()
return proto.Marshal(profile)
}
3. Batch Publishing
For high-throughput scenarios, batch messages to amortize NATS protocol overhead:
func (p *Publisher) PublishBatch(profiles []*pb.UserProfile) error {
for _, profile := range profiles {
data, err := proto.Marshal(profile)
if err != nil {
return err
}
if err := p.nc.Publish("user.profile.update", data); err != nil {
return err
}
}
return p.nc.Flush() // Ensure all messages are sent
}
4. JetStream for Guaranteed Delivery
When you need persistence without sacrificing much performance:
js, err := nc.JetStream()
if err != nil {
return err
}
// Create stream (once)
js.AddStream(&nats.StreamConfig{
Name: "USERS",
Subjects: []string{"user.>"},
})
// Publish with ack
_, err = js.Publish("user.profile.update", data)
JetStream adds ~100-200μs of latency for the ack roundtrip, but you get durability and replay capabilities.
Monitoring and Observability
You can’t optimize what you don’t measure. Instrument your message pipeline:
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
publishLatency = promauto.NewHistogram(prometheus.HistogramOpts{
Name: "nats_publish_duration_microseconds",
Help: "Time taken to publish message",
Buckets: []float64{100, 200, 500, 1000, 2000, 5000},
})
processLatency = promauto.NewHistogram(prometheus.HistogramOpts{
Name: "message_process_duration_microseconds",
Help: "Time taken to process message",
Buckets: []float64{100, 200, 500, 1000, 2000, 5000},
})
)
func (p *Publisher) PublishWithMetrics(profile *pb.UserProfile) error {
start := time.Now()
defer func() {
publishLatency.Observe(float64(time.Since(start).Microseconds()))
}()
return p.PublishUpdate(profile)
}
When This Stack Makes Sense
This architecture is ideal for:
- Trading systems where every microsecond impacts profitability
- Gaming backends requiring real-time state synchronization
- IoT platforms processing millions of device telemetry messages
- Event sourcing architectures with high event volumes
- Internal microservices where you control both ends
It’s probably overkill for:
- Public REST APIs where clients expect JSON
- Admin dashboards with human-in-the-loop workflows
- Prototyping where developer velocity matters more than performance
- Systems where debugging ease trumps efficiency
The Migration Path
You don’t need to rewrite everything overnight. Start with your most latency-sensitive paths:
- Identify bottlenecks: Use profiling to find services spending time on serialization
- Start internal: Convert service-to-service communication first
- Keep JSON at edges: Public APIs can still use JSON while internal services use Protobuf
- Gradual rollout: Run both protocols in parallel during migration
- Measure everything: Validate that the complexity is worth the gains
Conclusion
Breaking the millisecond barrier isn’t about micro-optimizations or assembly code. It’s about choosing the right architectural patterns for your performance requirements. JSON and HTTP are excellent defaults, but when you need to scale to millions of messages per second with sub-millisecond latency, binary protocols and persistent messaging systems become essential.
Go’s efficiency, NATS’s speed, and Protobuf’s compact binary format combine to create a stack capable of consistently delivering sub-millisecond message processing. The key is understanding your requirements and being willing to embrace some additional complexity in exchange for dramatic performance improvements.
For systems where latency is a competitive advantage—or a hard requirement—this combination is hard to beat. The question isn’t whether you can achieve sub-millisecond latency, but whether your architecture is holding you back from reaching it.
Ready to go beyond JSON? Your microseconds are waiting.