skip to Main Content

Why Protobuf Beats JSON

January 16, 20263 minute read

  

Serialization is the process of converting objects into a binary stream. This binary stream is what we send over the network, store in files or databases

At the end of the day, everything becomes bytes. The real question is, how many bytes, and how efficiently are they encoded?

JSON Serialization

JSON serialization is not very efficient. First, the object is converted into JSON. This JSON is then turned into a plain text string. Finally, each character in that string is converted into byte using ASCII mapping.

User(id = 200). //object
{"id": 200}. //JSON conversion
7B 22 69 64 22 3A 32 30 30 7D. //ASCII Mapping
Json Serialization Steps
Character to ASCII Mapping

Issues with JSON Serialization ❌

  • Delimiters like { } ” : are also encoded
  • Everything is treated as characters and each character requires 01 byte
  • Total Bytes required to encode {“id”:200} is 10

Protobuf Serialization

use TLV (Tag-Length-Value) encoding, where each field in an object is encoded in a compact binary form. The Tag identifies the field’s position and its wire type, the Length specifies how many bytes the value occupies, and the Value contains the actual data. This structured approach allows data to be parsed efficiently while keeping the payload size small.

lets take a example of User Object with id = 200

//Proto Schema
syntax = "proto3";
message User {
int32 id = 1;
}

//kotlin generated User class
val user = User.newBuilder().setId(200).build()
  • The TAG of a record is encoded using field number and the wire type via the formula (field_number << 3) | wire_type.
  • field number = 1 (position of field)
  • wire type = 0 (VARINT) refer this for more info
## TAG ENCODING

field_number << 3 | wire_type
1 << 3 | 0
1000 | 0000
1000 (binary)
08 (HEX)
  • For the LENGTH encoding, for VARINT wire types (int32, int64 etc), a separate length field is not required. Instead, the value itself contains continuation bits that indicate whether the next byte is part of the same record.
  • Now for the VALUE encoding for VARINT works by first representing a number in binary and then splitting it into fixed-size group of 7 bits. These groups are ordered in little-endian form, meaning the least significant group comes first. Each group is then placed into a byte, where an extra continuation bit is added to indicate whether more bytes follow. All bytes except the last have this bit set,
## VALUE ENCODING
200 //original input in decimal
0000001 1001000 //binary conversion and group by 7 bits
1001000 0000001 //convert to little endian
11001000 00000001 //set communication bits
C8 01 //hex representation
  • Final Output of TLV encoding
08 C8 01

So Why Protobuf beats JSON? ✅

  • No additional delimiters are encoded
  • Field Names are skipped, instead decoding end uses proto spec file to decode field name using TAG
  • Total Bytes required to encode 1:200 is 03 (70% less than JSON)

Final Thoughts

Engineering is about trade-offs. Protobuf reduces payload size and latency, but makes debugging and inspection harder since decoding requires the schema. JSON is easier to work with and integrates well with existing tools. so Optimize only when it matters


Why Protobuf Beats JSON was originally published in ProAndroidDev on Medium, where people are continuing the conversation by highlighting and responding to this story.

 

Web Developer, Web Design, Web Builder, Project Manager, Business Analyst, .Net Developer

No Comments

This Post Has 0 Comments

Leave a Reply

Back To Top