OpenPGP Message Format in Elixir
TL;DR; OpenPGP library in Elixir is here
Intro
Dealing with PGP/GPG encrypted files is not fun, especially when no tooling is around. That’s exactly where I ended up when I had to decrypt a PGP-encrypted file in Elixir. It took some time to get through all the moments of frustration, despair, gotchas, and eventually success. Do not ask why, but the decision has been made to decrypt the PGP encrypted file in Elixir. I’ve googled for about 20 minutes and surprisingly did not find the PGP library in Elixir or Erlang. That’s not a good sign. Usually, if there is a library or a success at solving a problem, it has its’ record in Google search indexes.
I’ve asked folks around who’ve faced the same need, and they confirmed the absence of the PGP library in Elixir or Erlang. The proposed options for this problem were:
- Use NIFs
- Use the
gpg
CLI tool via ports orSystem.cmd
- Write your own PGP implementation
Options one and two did not work, as they are way too easy (read — did not work for us due to deployment process limitations). Option three seems like the right one, given that PGP, in a nutshell, is a combination of symmetric/asymmetric cipher algorithms, hashing algorithms, compression algorithms, various encodings, and a strict specification. I want to share my experience writing an Elixir library for OpenPGP Message Format and eventually decrypting the file. I hope that will help someone and maybe give them the confidence to dive into the unknown, hoping for the best.
Naive attempt
My first attempt was naive and did not work, but I have to share it, as I’ve seen some misleading threads and discussions on the internet. How I pictured it:
- Export a private key in the
gpg
tool (GnuPG) - Load that key in Elixir
- Decrypt the PGP-encrypted file
The gpg --export-secret-key --armor john.doe@example.com > rsa2048-priv.armor.pgp
command produced output similar to Base64 encoded RSA key, some certificate or equivalent.
-----BEGIN PGP PRIVATE KEY BLOCK-----
lQPGBGWUT9gBCADdsbW+4+TstPCgiFMRJ0wl+aNDtEWOGGJG48DeaeBDOvZVmy9I
aq8Oq9MgblG3hehBrFLMEjt/TegOzUdOWHNZhIHJMKh71l4btiL1sq1mY4xlCZfx
BP7HpCRrwA5GEVLROpw0NExgOMrmvtid9+Fco68RzLKQtTDgCnjAJskRGfxtqgbH
...
11OOuI3P6T9xJkKoyf2k4KVwCjb+47TAFcxauFYYlAp+Szfeqoi2
=66nu
-----END PGP PRIVATE KEY BLOCK-----
So I gave it a try in Elixir:
# Elixir
"rsa2048-priv.armor.pgp" |> File.read!() |> :public_key.pem_decode()
and got an empty list back. I googled for some hints and found a solution (that did not work afterall)! Replace “BEGIN / END PGP PRIVATE KEY BLOCK” with “BEGIN / END RSA PRIVATE KEY.” Also, I had to remove the line at the end starting with the equal sign =66nu
. That's what I did, and surprisingly, :public_key.pem_decode()
gave back a list with one item that looked promising (I had no clue what that data meant, but at least it did not raise an error):
# Elixir
[
{:RSAPrivateKey,
<<149, 3, 198, 4, 101, 148, 79, 216, 1, ...>>,
:not_encrypted}
]
The next step was to read the encrypted file and decrypt it. I’ve tried everything I could:
:public_key.decrypt_private/2
and:public_key.decrypt_public/2
:crypto.private_decrypt/4
and:crypto.public_decrypt/4
:public_key.pem_entry_decode/1/2
:public_key.der_decode(:RSAPrivateKey, der)
:public_key.der_decode(:RSAPrivateKey, der2)
whereder2
is DER data with the first 26 bytes skipped (you'll probably find some discussion around this if you Google for a while)- And probably many more that I can’t recall
I really want you to know that none of that worked! At least for me. That was frustrating, as I knew little about RSA, AES, and other ciphers. What do I do now?
More thorough approach
Alright, nothing worked out of the box. That’s when I started reading RFC4880. This document describes all the details of the OpenPGP Message Format and should lead me to the action plan for decrypting the file. My initial understanding and action plan were:
- Load RSA private key
- Decrypt file
So little did I know… I will skip the part where I went down the rabbit hole, decoding one packet after another, having no confidence that I could see the decrypted data, etc. Luckily for me, I had a moment of great excitement when, eventually, after days of coding, I saw decoded data. Here is what the action plan turned out to be:
The PGP Encrypted file (the one that I need to decrypt):
- Find the Public Key ID — Decode the PGP encrypted file. The first packet, Public Key Encrypted Session Key Packet, will have the key ID used to encrypt the session key (more details below).
The Exported secret key file (gpg --export-secret-key ...
):
- Decode the file and find the Secret Key Packet by Public Key ID.
- Read “String-to-Key Specifier” (S2K Specifier), which will allow to convert the passphrase into a symmetric-key encryption/decryption key.
- According to the S2K Specifier, convert the passphrase into a key.
- With symmetric-key algorithm specified in S2K and passphrase decrypt Secret Key Packet body.
- The resulting Multi Precision Integers are RSA private key material.
Back to the PGP Encrypted file:
- Decrypt the Public Key Encrypted Session Key Packet with RSA private key material. It specifies a symmetric-key algorithm and has a key material for the following data packet.
- Decode and decrypt the “Symmetrically Encrypted and Integrity Protected Data Packet.” The body contains more packets.
- Decode the body and find the “Compressed Data Packet.”
- Decode and decompress/inflate the “Compressed Data Packet.” The body contains other packets.
- Decode the body and find the “Literal Data Packet.”
- And finally, the body of the “Literal Data Packet” is my plaintext data.
NOTE: This action plan is for my case, though probably the most common case as well. But keep in mind that PGP Encrypted files may have multiple Public Key Encrypted Session Key Packets (one packet per recipient).
If I had known all of this before I started messing around with RFC4880, I probably would not have gone for it. But I peeled layer after layer to find more packets to implement and decode. With all set and done, I ended up with ~3,500 lines of code (probably 40% documentation, another 40% tests, and 20% actual code). I want to make it open-source someday, but it’s a company property for now and we are working hard to make it public.
UPDATE: With a team effort this project went public in March 2024 — https://github.com/DivvyPayHQ/open_pgp
Now, a bit of theory will help better understand what’s going on and explain why OpenPGP messages are formatted that way.
RFC4880 — excerpts, personal notes, and some explanations
The RFC4880 document is maintained in order to publish all necessary information needed to develop interoperable applications based on the OpenPGP format. It is not a step-by-step cookbook for writing an application. It describes only the format and methods needed to read, check, generate, and write conforming packets crossing any network.
OpenPGP provides data integrity services for messages and data files by using these core technologies:
- Symmetric encryption (such as AES, CAST5, 3DES)
- Asymmetric encryption (such as RSA, Elgamal, DSA)
- Digital signatures
- Compression
- Radix-64 conversion
OpenPGP combines symmetric-key encryption and public-key encryption to provide confidentiality.
The sequence is as follows:
- The sender creates a message (maybe a file).
- The sending OpenPGP generates a random number to be used as a session key for this message only.
- The session key is encrypted using each recipient’s public key. These “encrypted session keys” start the message.
- The sending OpenPGP encrypts the message using the session key, which forms the remainder of the message. Note that the message is also usually compressed.
- The receiving OpenPGP decrypts the session key using the recipient’s private key.
- The receiving OpenPGP decrypts the message using the session key. If the message was compressed, it will be decompressed.
The action that I’ve outlined in a section “More thorough approach” follows this sequence. Also, the sender must have the recipient’s public key.
To avoid confusion, the term “Public key” may have different meanings based on context.
- Public key cryptography, sometimes called public key encryption, uses two cryptographic keys: a public key and a private key. Examples of public key algorithms today are RSA, DSA, and ECDSA.
- A public key is also a cryptographic key that anyone can obtain and use to encrypt messages intended for a particular recipient.
That might change how you read Erlang library :public_key
.
Begin the implementation
With a bit of theory and RFC4880, I have a first try to implement the first feature — read packets in a file/message.
Per RFC4880: An OpenPGP message is constructed from a number of records that are traditionally called packets. A packet is a chunk of data that has a tag specifying its meaning. Some of those packets may contain other OpenPGP packets (for example, a compressed data packet, when uncompressed, contains OpenPGP packets). Each packet consists of a packet header, followed by the packet body.
For this example example, I will decode a PGP file/message. The file is not armored (not Radix64 encoded).
I’ve used gpg
tool to encrypt a file with my public key "john.doe@example.com":
# Bash
~$ gpg --encrypt -r john.doe@example.com ./hello-world.txt
I’ll need to perform calculations on bits. Therefore, I’ll import the Bitwise
module right away. Also, I'll load the file contents.
# Elixir
import Bitwise
file_path =
"/Users/pavel.tsiukhtsiayeu/Library/Application Support/livebook/autosaved/2024_01_11/04_09_hd3h/files/hello-world.txt.gpg"
file_data = File.read!(file_path)
Also worth noting is that since all data in a file represent sequential packets, all functions that read and decode data will have a similar return type — a tuple, where the first element is the result of the processing of a chunk of data and the second element is the remaining binary data. The function spec might look like this: @spec my_fun(data :: binary()) :: {result :: any(), rest :: binary()}
Read Packet Tag
I’ll try to decode the encrypted text file hello-world.txt.gpg
into packets. But before doing that, I want to inspect the file with the gpg
tool so that I can compare my results (I'll see ciphertext as packet content, which does not say much) and prove that I read bytes and bits correctly as per RFC4880.
# Bash
~$ gpg --list-packets --verbose ./hello-world.txt.gpg
gpg: enabled compatibility flags:
gpg: public key is B80510474E7B88FE
gpg: using subkey B80510474E7B88FE instead of primary key 052E8381B5C335DA
gpg: pinentry launched (93152 curses 1.2.1 /dev/ttys005 xterm-256color - 20620/502/4 502/20 0)
gpg: using subkey B80510474E7B88FE instead of primary key 052E8381B5C335DA
gpg: encrypted with 2048-bit RSA key, ID B80510474E7B88FE, created 2024-01-03
"John Doe (RSA2048) <john.doe@example.com>"
gpg: AES256 encrypted data
# off=0 ctb=85 tag=1 hlen=3 plen=268
:pubkey enc packet: version 3, algo 1, keyid B80510474E7B88FE
data: 1F604116DA3AEA67B6ACC1D7F4768E738EB82F680FD96C49CEB111771...
# off=271 ctb=d2 tag=18 hlen=2 plen=86 new-ctb
:encrypted data packet:
length: 86
mdc_method: 2
# off=292 ctb=a3 tag=8 hlen=1 plen=0 indeterminate
:compressed packet: algo=2
# off=294 ctb=ac tag=11 hlen=2 plen=37
:literal data packet:
mode b (62), created 1704946389, name="hello-world.txt",
raw data: 16 bytes
That output looks cryptic at first glance, but hey, we are doing some cryptography here. This is what is important for now:
- Each packet starts with hash
# off=0 ctb=85 tag=1 hlen=3 plen=268
-off=0
indicates the offset of the packet within the stream in bytes (this may not be accurate if there are compressed packets)
-ctb=85
(Content Tag Byte, hex value0x85 = 133
) includes the type of the packet, and some information about the length of the packet (if this is a new format packet, thennew-ctb
will appear towards the end of the line)
-tag=1
is the type of the packet as extracted from thectb
-hlen=3
is the header length in bytes
-plen=268
is the body length in bytes - The next line starts with packet abbreviation followed by packet-specific data:
-:pubkey enc packet: version 3, algo 1, keyid B80510474E7B88FE
-:encrypted data packet:
-:compressed packet: algo=2
-:literal data packet:
The output tells that there are four packets in this file, but two are nested within the encrypted data packet (you can tell that by looking at the offset). The reason why gpg
showed :compressed packet
and :literal data packet
is because I have a private key in my keychain and provided a valid passphrase while executing the command. Otherwise the output would include only 2 packets: :pubkey enc packet
and :encrypted data packet
. The structure looks like this:
├── :pubkey enc packet: version 3, algo 1, keyid B80510474E7B88FE
│
└── :encrypted data packet:
│
└── :compressed packet: algo=2
│
└── :literal data packet:
I’ll start decoding hello-world.txt.pgp
file contents with that information.
The first octet (byte) of the packet header is called the “Packet Tag.” It determines the format of the header and denotes the packet contents. The remainder of the packet header is the length of the packet.
+---------------+
PTag |7 6 5 4 3 2 1 0|
+---------------+
Bit 7 -- Always one
Bit 6 -- New packet format if set
Old format packets contain:
Bits 5-2 -- packet tag
Bits 1-0 -- length-type
New format packets contain:
Bits 5-0 -- packet tag
I’ll define read_packet_tag/1
function, which will pattern match the input binary, decode the packet tag, and return a two-element tuple with the first element as a three-element tuple of decoded tag and the remaining binary. The remaining binary will start with the packet body length octet(s) followed by the packet body binary, and then the next packet tag will start the next packet, and so on.
# Elixir
read_packet_tag = fn
<<1::1, 0::1, id::4, length_type::2, next::binary>> -> {{:old, id, length_type}, next}
<<1::1, 1::1, id::6, next::binary>> -> {{:new, id, nil}, next}
end
{packet1_tag, next} = read_packet_tag.(file_data)
Please note, since I defined the anonymous function and matched it to a variable, there is a dot in the function call.
When :read_packet_tag/1
function is applied to encrypted text file content, I get back {:old, 1, 1}
packet tag tuple, which denotes:
- The old format tag
- Tag ID 1 = “Public-Key Encrypted Session Key Packet”
- Packet length-type 1 = “The packet has a two-octet length” (more about length-type down below)
RFC4880 tag ID to name:
0 -- Reserved - a packet tag MUST NOT have this value
1 -- Public-Key Encrypted Session Key Packet
2 -- Signature Packet
3 -- Symmetric-Key Encrypted Session Key Packet
4 -- One-Pass Signature Packet
5 -- Secret-Key Packet
6 -- Public-Key Packet
7 -- Secret-Subkey Packet
8 -- Compressed Data Packet
9 -- Symmetrically Encrypted Data Packet
10 -- Marker Packet
11 -- Literal Data Packet
12 -- Trust Packet
13 -- User ID Packet
14 -- Public-Subkey Packet
17 -- User Attribute Packet
18 -- Sym. Encrypted and Integrity Protected Data Packet
19 -- Modification Detection Code Packet
60 to 63 -- Private or Experimental Values
Read Body Length
The meaning of the length-type in old format packets is:
0
- The packet has a one-octet length.1
- The packet has a two-octet length.2
- The packet has a four-octet length.3
- The packet is of indeterminate length. If the packet is in a file, this means that the packet extends until the end of the file.
New format packets have four possible ways of encoding length:
- A one-octet Body Length header encodes packet lengths of up to 191 octets.
- A two-octet Body Length header encodes packet lengths of 192 to 8,383 octets.
- A five-octet Body Length header encodes packet lengths of up to 4,294,967,295 (0xFFFFFFFF) octets in length.
- When the length of the packet body is not known in advance by the issuer, Partial Body Length headers encode a packet of indeterminate length, effectively making it a stream.
To decode packet body length, I’ll define the function :read_body_length/2
. It will take packet tag tuple and binary data returned by the previous :read_packet_tag/1
function call.
In this example I will not implement the Partial Body Length header, as it requires recursion and will make this example a bit more complex, which I try to avoid for the sake of clarity. The Partial Body Length header will raise an error but still calculate the body chunk size.
# Elixir
read_body_length = fn
{:old, _, 0}, <<len::8, next::binary>> ->
{len, 1, next}
{:old, _, 1}, <<len::16, next::binary>> ->
{len, 2, next}
{:old, _, 2}, <<len::32, next::binary>> ->
{len, 4, next}
{:old, _, 3}, next ->
{byte_size(next), 0, next}
{:new, _, _}, <<len::8, next::binary>> when len < 192 ->
{len, 1, next}
{:new, _, _}, <<b1::8, b2::8, next::binary>> when b1 in 192..223 ->
{((b1 - 192) <<< 8) + b2 + 192, 2, next}
{:new, _, _}, <<255::8, len::32, next::binary>> ->
{len, 5, next}
{:new, _, _}, <<b1::8, _::binary>> when b1 in 224..254 ->
raise(
"Partial body length not implemented. Body Length header size: 1 byte. Chunk size: #{1 <<< (b1 &&& 0x1F)}"
)
end
{packet1_len, _body_length_header_size, next} = read_body_length.(packet1_tag, next)
Great, the “Public-Key Encrypted Session Key Packet” has a body length of 268 bytes. The next step is to define a :read_body/2
function that will read that many bytes from the input data and return a tuple with packet body and remaining binary.
Read Read Packet Body
# Elixir
read_body = fn len, data ->
<<body::bytes-size(len), next::binary>> = data
{body, next}
end
{packet1_body, next} = read_body.(packet1_len, next)
That looks very promising. I have the first packet decoded — it is the “Public-Key Encrypted Session Key Packet.” The results match gpg
output:
# off=0 ctb=85 tag=1 hlen=3 plen=268
:pubkey enc packet: version 3, algo 1, keyid B80510474E7B88FE
The contents still need to be processed, but this is for another time. Now, I want to read the second packet and expect it to take all the remaining data. I have all the functions in place! Let’s give it a try.
# Elixir
{packet2_tag, next} = read_packet_tag.(next)
OK. This packet has a new format and tag id 18 = “Sym. Encrypted and Integrity Protected Data Packet”.
# Elixir
{packet2_len, _body_length_header_size, next} = read_body_length.(packet2_tag, next)
The body length is 86 bytes. It should be all the data in the encrypted text file, so I pattern match with an empty string on the return value of the :read_body/2
function call.
# Elixir
{packet2_body, <<>>} = read_body.(packet2_len, next)
That is awesome and thrilling! Seems like I’m on the right track. The next step will be to decrypt “Public-Key Encrypted Session Key Packet” (packet1
) that I've decoded previously, extract the session key, and decrypt "Sym. Encrypted and Integrity Protected Data Packet" (packet2
) with the session key.
Decrypt Public-Key Encrypted Session Key Packet
The first packet in the encrypted PGP file is the Public-Key Encrypted Session Key Packet, as denoted by its’ Packet Tag ID=1. I’ll quote some of RFC4880 excerpts related to the format of this packet:
A Public-Key Encrypted Session Key packet holds the session key used to encrypt a message. The message is encrypted with the session key, and the session key is itself encrypted and stored in the Encrypted Session Key packet(s). The Symmetrically Encrypted Data Packet is preceded by one Public-Key Encrypted Session Key packet for each OpenPGP key to which the message is encrypted. The recipient of the message finds a session key that is encrypted to their public key, decrypts the session key, and then uses the session key to decrypt the message.
The body of this packet consists of:
- A one-octet number giving the version number of the packet type. The currently defined value for packet version is 3.
- An eight-octet number that gives the Key ID of the public key to which the session key is encrypted.
- A one-octet number giving the public-key algorithm used.
- A string of octets that is the encrypted session key. This string takes up the remainder of the packet, and its contents are dependent on the public-key algorithm used.
Algorithm Specific Fields for RSA encryption
- multiprecision integer (MPI) of RSA encrypted value m**e mod n.
9.1. Public-Key Algorithms
ID Algorithm
-- ---------
1 - RSA (Encrypt or Sign) [HAC]
2 - RSA Encrypt-Only [HAC]
3 - RSA Sign-Only [HAC]
16 - Elgamal (Encrypt-Only) [ELGAMAL] [HAC]
17 - DSA (Digital Signature Algorithm) [FIPS186] [HAC]
18 - Reserved for Elliptic Curve
19 - Reserved for ECDSA
20 - Reserved (formerly Elgamal Encrypt or Sign)
21 - Reserved for Diffie-Hellman (X9.42,
as defined for IETF-S/MIME)
100 to 110 - Private/Experimental algorithm
With that information, I should be able to decrypt this packet body.
# Elixir
decrypt_pk_encrypted_session_key_packet = fn
<<version::8, key_id::binary-size(8), pk_algo::8, ciphertext::binary>> ->
{version, Base.encode16(key_id), pk_algo, ciphertext}
end
{_ver, _key_id, _algo, encrypted_session_key_mpi} =
decrypt_pk_encrypted_session_key_packet.(packet1_body)
Alright. That data matches gpg --list-packets
output:
...
# off=0 ctb=85 tag=1 hlen=3 plen=268
:pubkey enc packet: version 3, algo 1, keyid B80510474E7B88FE
...
The next step is to decode the MPI value. RFC4880 describes this as:
Multiprecision integers (also called MPIs) are unsigned integers used to hold large integers such as the ones used in cryptographic calculations.
An MPI consists of two pieces: a two-octet scalar that is the length of the MPI in bits followed by a string of octets that contain the actual integer. The length field of an MPI describes the length starting from its most significant non-zero bit.
The size of an MPI is ((MPI.length + 7) / 8) + 2 octets.
# Elixir
decode_mpi = fn <<mpi_length::16, rest::binary>> ->
octets_count = floor((mpi_length + 7) / 8)
<<mpi_value::bytes-size(octets_count), next::binary>> = rest
{mpi_value, next}
end
{encrypted_session_key, <<>>} = decode_mpi.(encrypted_session_key_mpi)
To decrypt the ciphertext in the Public-Key Encrypted Session Key Packet, I need to decode and decrypt the Secret-Key Packet that is exported with gpg --export-secret-keys john.doe@example.com
Decrypt (shortcut) Secret-Key Packet
Decrypting Secret-Key Packet is a whole other story and I’m not going into details of all the complexities of decrypting it. Instead, I will briefly touch a few interesting moments later in this article. For now, I will inspect with gpg
and use as much data as possible to build a secret key material.
# Bash
~$ gpg --export-secret-keys john.doe@example.com | gpg --list-packets --verbose
...
# off=1349 ctb=9d tag=7 hlen=3 plen=966
:secret sub key packet:
version 4, algo 1, created 1704326291, expires 0
pkey[0]: A36DE0E9799AE16BACFE60119C0401613DA4A513DD596EADD49DCA7D864F15B43FE936DFEDB98F8C780BB072270ABBC1C2D48E0BCAF7C608CD07B14FCE7AB5ACF856ACBE7E1ADBEEC80873673BA0FAF6A79F92B895FDD45EC621DE60CB1E335A53440DDE7D7C618B2E3E5E6AF41C1D69255ACB033EEC20D72DEC6EF76E5EC6BD2E46DC9EFE00470FA63C256B2B1E324A4C5CCCB4EFC9C579DA8535043DE338D33831F9FB769CB5F5326E7E99FD99B9E46E682A1E481F13D0D0CB20FD9465D45E89B06B0A03E9028A8A7E91B23383AAE2A6F940C37BFD7CEC354B56EA1B22803024E3DDB6FB540504268ED7E867725C5D1BA25DC03D141D088519E9CAA23D28B7
pkey[1]: 010001
iter+salt S2K, algo: 7, SHA1 protection, hash: 2, salt: 4B7C3F131EC98212
protect count: 60817408 (253)
protect IV: 53 9e 8f 49 d9 71 07 4e 08 84 5e 34 94 5d e6 5a
skey[2]: [v4 protected]
keyid: B80510474E7B88FE
...
NOTE: I’ve omitted other packets in this message and shown the secret-subkey, which has the same format as a Secret-Key Packet. RFC4880 explains the difference between keys and subkeys and their common use cases.
The Packet Tag 7 identifies the Secret-Subkey Packet, and RFC4880 describes the format. I’ll make a shortcut here and say that pkey[0]: A36DE0E9799AE16BACFE...
and pkey[1]: 010001
are "public modulus n" and "public encryption exponent e" of the RSA public key, respectively. To get secret key material skey[2]: [v4 protected]
, I'll need to decode the S2K Specifier and decrypt ciphertext.
The “String-to-Key Specifier” of the Secret-Key Packet describes how to convert passphrase strings into symmetric-key encryption/decryption keys to decrypt secret key material. For that, we’ll need:
algo: 7, SHA1 protection
- symetric-key algo 7 = "AES with 128-bit key [AES]"salt: 4B7C3F131EC98212
protect count: 60817408 (253)
protect IV: 53 9e 8f 49 d9 71 07 4e 08 84 5e 34 94 5d e6 5a
It took me some time to figure out how to apply “Iterated and Salted S2K”. RFC describes it as follows:
Iterated-Salted S2K hashes the passphrase and salt data multiple times. The total number of octets to be hashed is specified in the encoded count in the S2K specifier. Note that the resulting count value is an octet count of how many octets will be hashed, not an iteration count.
Initially, one or more hash contexts are set up as with the other S2K algorithms, depending on how many octets of key data are needed. Then the salt, followed by the passphrase data, is repeatedly hashed until the number of octets specified by the octet count has been hashed. The one exception is that if the octet count is less than the size of the salt plus passphrase, the full salt plus passphrase will be hashed even though that is greater than the octet count.
Here I’ll give some hints how to get symmetric-key algo decryption key and hot to apply it to the Secret-Key packet ciphertext:
# Elixir
defmodule S2KSpecifier do
@max_hash_contexts 100
@zero_octet <<0::8>>
def build_session_key(key_bit_size, "" <> _ = passphrase, "" <> _ = salt, protect_count) do
salted_passphrase = salt <> passphrase
iter_count = ceil(protect_count / byte_size(salted_passphrase))
<<hash_input::bytes-size(protect_count), _::binary>> =
Enum.reduce(1..iter_count, "", fn _, acc -> acc <> salted_passphrase end)
iterated_s2k_hash =
Enum.reduce_while(1..@max_hash_contexts, "", fn context_num, acc ->
if bit_size(acc) < key_bit_size do
prefix = String.pad_trailing("", context_num - 1, @zero_octet)
{:cont, acc <> :crypto.hash(:sha, prefix <> hash_input)}
else
{:halt, acc}
end
end)
<<key::size(key_bit_size), _::bits>> = iterated_s2k_hash
<<key::size(key_bit_size)>>
end
end
key_size = 128 # algo 7 = "AES with 128-bit key [AES]"
protect_count_decoded = 60_817_408
salt = Base.decode16!("4B7C3F131EC98212")
session_key = S2KSpecifier.build_session_key(key_size, "passphrase", salt, protect_count_decoded)
plaintext = :crypto.crypto_one_time(:aes_128_cfb128, session_key, iv, ciphertext, false)
Once the Secret-Key Packet is decoded, the plaintext will have the algorithm-specific fields for RSA:
- multiprecision integer (MPI) of RSA secret exponent d
- MPI of RSA secret prime value p
- MPI of RSA secret prime value q (p < q)
- MPI of u, the multiplicative inverse of p, mod q
Back to Decrypt Public-Key Encrypted Session Key Packet
I’ll use Erlang’s :crypto.private_decrypt/4
to decrypt the Public-Key Encrypted Session Key Packet. It takes algorithm
, ciphertext
, private_key
, and options
, and returns plaintext. The private_key
is a list of three elements:
- Public exponent e —
pkey[1]: 010001
- Public modulus n —
pkey[0]: A36DE0E9799AE16BACFE...
- Secret exponent d — The first MPI in the secret packet (decoded behind the scenes)
# Elixir
pub_exp_e = Base.decode16!("010001")
pub_mod_n =
Base.decode16!(
"A36DE0E9799AE16BACFE60119C0401613DA4A513DD596EADD49DCA7D864F15B43FE936DFEDB98F8C780BB072270ABBC1C2D48E0BCAF7C608CD07B14FCE7AB5ACF856ACBE7E1ADBEEC80873673BA0FAF6A79F92B895FDD45EC621DE60CB1E335A53440DDE7D7C618B2E3E5E6AF41C1D69255ACB033EEC20D72DEC6EF76E5EC6BD2E46DC9EFE00470FA63C256B2B1E324A4C5CCCB4EFC9C579DA8535043DE338D33831F9FB769CB5F5326E7E99FD99B9E46E682A1E481F13D0D0CB20FD9465D45E89B06B0A03E9028A8A7E91B23383AAE2A6F940C37BFD7CEC354B56EA1B22803024E3DDB6FB540504268ED7E867725C5D1BA25DC03D141D088519E9CAA23D28B7"
)
sec_exp_d =
Base.decode16!(
"10137595AC849E655275B8A02D7CA760C2B38E197A2345F93624F4BA36EFFDEE912ADBB458AA1C2E1BD0EA8BBABE1A3761218F7CD17BB605EA45361D3CEE62834A9A716650F94B6672FED725F1D3A55C30A39DC727D9F97DFE776E6C8F0E6ACC19220F4B322F7E038C34F90CAEF3E50B66C54645B776D01EDA96F0AE1E33EC7B49EF74F6B5893DCF8818DCAB6C0AE8B7549F8D67CB52A423E6C1CEC54C8D3F828050198769351764279431CD4AFF9D813B6B6BB01BEC025F2AB3509D8ED13ECBC8DB7CB59C916B1BB7DCA8A2540C0803ECEEDABCAA3CBB24FE037E85ADDD9B42EEB35DF694A5755F402B2EE6AC1A76CAC65AC852D18346585322B637080C68A1"
)
priv_key = [pub_exp_e, pub_mod_n, sec_exp_d]
decrypted_session_key_payload = :crypto.private_decrypt(:rsa, encrypted_session_key, priv_key, [])
The decrypted_session_key_payload
should be decoded as described in RFC:
The value “m” in the above formulas is derived from the session key as follows. First, the session key is prefixed with a one-octet algorithm identifier that specifies the symmetric encryption algorithm used to encrypt the following Symmetrically Encrypted Data Packet. Then a two-octet checksum is appended, which is equal to the sum of the preceding session key octets, not including the algorithm identifier, modulo 65536.
# Elixir
algo_bitsize = 8
checksum_bitsize = 16
bsize = byte_size(decrypted_session_key_payload) - 2 - 1
<<
sym_key_algo::size(algo_bitsize),
session_key::bytes-size(bsize),
expected_checksum::size(checksum_bitsize)
>> = decrypted_session_key_payload
{sym_key_algo, session_key, expected_checksum}
The symmetric-key algo 9 = “AES with 256-bit key”. Good to know for later.
Now, to verify the checksum:
# Elixir
actual_checksum = for <<b::8 <- session_key>>, reduce: 0, do: (acc -> acc + b)
^expected_checksum = expected_checksum
OK. We have a symmetric key algo session key, and now we have to to decrypt second packet (packet2
) in our message which has Packet Tag ID=18 - Sym. Encrypted and Integrity Protected Data Packet.
Decode Sym. Encrypted and Integrity Protected Data Packet and friends
Let’s look at RFC4880 to understand how to decode and decrypt Sym. Encrypted and Integrity Protected Data Packet:
The body of this packet consists of:
- A one-octet version number. The only currently defined value is 1.
- Encrypted data, the output of the selected symmetric-key cipher operating in Cipher Feedback mode with shift amount equal to the block size of the cipher (CFB-n where n is the block size).
The data is encrypted in CFB mode, with a CFB shift size equal to the cipher’s block size. The Initial Vector (IV) is specified as all zeros. Instead of using an IV, OpenPGP prefixes an octet string to the data before it is encrypted. The length of the octet string equals the block size of the cipher in octets, plus two. The first octets in the group, of length equal to the block size of the cipher, are random; the last two octets are each copies of their 2nd preceding octet.
The repetition of 16 bits in the random data prefixed to the message allows the receiver to immediately check whether the session key is incorrect.
# Elixir
<<protected_packet_version::8, ciphertext::binary>> = packet2_body
1 = protected_packet_version
# AES cipher
cipher_block_size_bits = 128
null_iv = for(_ <- 1..cipher_block_size_bits, into: <<>>, do: <<0::1>>)
plaintext = :crypto.crypto_one_time(:aes_256_cfb128, session_key, null_iv, ciphertext, false)
Now we need to check the repetition of the last two octets in the prefix octet string to verify that decryption was done right:
# Elixir
checksum_octets_count = 2
prefix_byte_size = Kernel.div(cipher_block_size_bits, 8) - checksum_octets_count
<<
_::bytes-size(prefix_byte_size),
chsum1::checksum_octets_count*8,
chsum2::checksum_octets_count*8,
protected_packet_data::binary
>> = plaintext
^chsum1 = chsum2
{chsum1, protected_packet_data}
That’s great. Checksum passed, and now we have the plaintext of the Sym. Encrypted and Integrity Protected Data Packet. Now I will decode protected_packet_data
with read_packet_tag/1
, read_body_length/1
, and read_body/1
# Elixir
{ptag, next} = read_packet_tag.(protected_packet_data)
{
# Packet Tag Format
:old,
# Packet Tag ID=8 - Compressed Data Packet
8,
# Length-type = 3 - The packet extends until the end of the file
3
} = ptag
{66 = blen, 0 = _hlen, next} = read_body_length.(ptag, next)
{compressed_data_packet_data, <<>>} = read_body.(blen, next)
I need to decode a Compressed Data Packet now and inflate the data. The format of this packet is:
The body of this packet consists of:
- One octet that gives the algorithm used to compress the packet.
- Compressed data, which makes up the remainder of the packet.
9.3. Compression Algorithms
ID Algorithm
-- ---------
0 - Uncompressed
1 - ZIP [RFC1951]
2 - ZLIB [RFC1950]
3 - BZip2 [BZ2]
100 to 110 - Private/Experimental algorithm
# Elixir
<<compression_algo::8, deflated_data::binary>> = compressed_data_packet_data
# ZLIB [RFC1950]
2 = compression_algo
max_chunks = 1024
z = :zlib.open()
:zlib.inflateInit(z)
inflated_data =
Enum.reduce_while(1..max_chunks, <<>>, fn _, acc ->
case :zlib.safeInflate(z, deflated_data) do
{:continue, [chunk]} -> {:cont, acc <> chunk}
{:finished, [chunk]} -> {:halt, acc <> chunk}
{:finished, []} -> {:halt, acc}
end
end)
:zlib.close(z)
inflated_data
Typically, this packet is found as the contents of an encrypted packet and contains a Literal Data Packet. Let’s see it:
# Elixir
{ptag, next} = read_packet_tag.(inflated_data)
{
:old,
# Packet Tag ID=11 - Literal Data Packet
11,
# Length-type = 0 - The packet has a one-octet length.
0
} = ptag
{37 = blen, 1 = _hlen, next} = read_body_length.(ptag, next)
{literal_data, <<>>} = read_body.(blen, next)
And finally, I’m about to decode the last packet — Literal Data Packet. RFC4880:
A Literal Data packet contains the body of a message; data that is not to be further interpreted.
The body of this packet consists of:
- A one-octet field that describes how the data is formatted.
- If it is a ‘b’ (0x62), then the Literal packet contains binary data.
- If it is a ‘t’ (0x74), then it contains text data, and thus may need line ends converted to local form, or other text-mode changes.
- The tag ‘u’ (0x75) means the same as ‘t’, but also indicates that implementation believes that the literal data contains UTF-8 text. - File name as a string (one-octet length, followed by a file name). This may be a zero-length string. Commonly, if the source of the encrypted data is a file, this will be the name of the encrypted file.
- A four-octet number that indicates a date associated with the literal data. Commonly, the date might be the modification date of a file, or the time the packet was created, or a zero that indicates no specific time.
- The remainder of the packet is literal data.
# Elixir
<<format::8, file_name_length::8, next::binary>> = literal_data
0x62 = format
<<file_name::bytes-size(file_name_length), date::4*8, data::binary>> = next
{file_name, DateTime.from_unix!(date), data}
I’ve never been that much excited to see the "Hello, World!!!\n"
binary. What a journey it was!
Summary
For now, I’ll park it right here. Hopefully, this will give some understanding and an example of how to decode a PGP message. As I’ve said earlier, I had success decrypting the PGP encrypted file as well as loading the secret key file. The most complex parts for me were:
- Dealing with Iterated and Salted String-to-Key Specifier in the Secret Key Packet
- Assembling RSA private key for
:crypto.crypto_one_time/5
to decrypt "Public Key Encrypted Session Key Packet" - The mistake that I’ve made decoding Multiprecision Integers — the length is given in bits, BUT they still take full bytes
((MPI.length + 7) / 8)
In my first version of a library v0.5.0
, I was able to deliver these features:
- Generic packet decoder — any valid OpenPGP message can be decoded
Literal Data Packet
Public Key Encrypted Session Key Packet
- support RSA onlyPublic Key Packet
- support only V4 packets
- Iterated and Salted String-to-Key (S2K) specifier (ID: 3)
- S2K usage convention octet of 254 only
- S2K hashing algo SHA1
- AES128 symmetric encryption of secret key materialCompressed Data Packet
- support only ZLIB- and ZIP-style blocksIntegrity Protected Data Packet
- support Session Key algo 9 (AES with 256-bit key) in CFB mode
- The Modification Detection Code system is not supportedRadix64
Any feedback, comments and questions are welcome. Cheers!
Refs, Snippets, Misc
# GPG commands
~$ gpg --list-keys
~$ gpg --list-secret-keys
~$ gpg --export-secret-key --armor john.doe@example.com > ./private.pgp
~$ gpg --list-packets --verbose example.txt.pgp
~$ gpg --encrypt --recipient F89B64F782254B03624FCF5C052E8381B5C335DA /usr/share/dict/words
~$ gpg --batch --passphrase "passphrase" --quick-generate-key "John Doe (RSA2048) <john.doe@example.com>" rsa2048 default never
~$ gpg --edit-key F89B64F782254B03624FCF5C052E8381B5C335DA
# Handy tools
~$ hexdump -vx ./words.pgp
~$ xxd -b ./words.pgp
~$ xxd -g 1
- GitHub repo: https://github.com/DivvyPayHQ/open_pgp
- Hex package: https://hex.pm/packages/open_pgp
- Hex docs: https://hexdocs.pm/open_pgp/