Crypto? Never roll your own.
Author’s note: The purpose of this post is to provide an introduction to cryptography, ciphers, and encoding techniques commonly used in capture the flag (CTF) challenges. It’s the resource I would have wanted when I was approaching my first CTF cryptography challenges! I provide examples of ciphertext (or encoded text) to help the build intuition that will help with cipher recognition! In my opinion, that’s the hardest part of solving CTF crypto challenges!
Table of Contents:
- Cryptography Concepts and Terms
- Types of Ciphers - Symmetric (Single Key)
- References / Sources
Cryptography Concepts and Terms⌗
I’ve found that Wikipedia has excellent articles on encoding and cryptographic systems, it’s a good place to look if you want more details on a specific encoding scheme or encryption algorithm.
- Encoding: to convert (something, such as a body of information) from one system of communication into another 1
- Cipher: an algorithm for performing encryption or decryption. 2
- Plaintext: The unencrypted or “original” message
- Ciphertext: The encrypted message (usually looks like gobbledegook)
- Frequency Analysis: A statistical method for cracking ciphers. Essentially, it assumes that the most frequent letter in ciphertext will correspond with the most frequent letter in the plaintext language, or that the most common three-letter word in the ciphertext corresponds to “the”. 3
- Key: a piece of information that specifies the transformation of plaintext into ciphertext, and vice versa for decryption algorithms. 4 Essentially, the key is part of the input into a cryptographic function that modifies the function’s operations while creating ciphertext in such a way that you have to have the key to get the plaintext from a decryption function.
- Symmetric Cipher: The same key is used to encrypt and decrypt the message. For example, ROT13.
- Asymmetric Cipher: Two distinct yet related keys (public and private) are used to encrypt and decrypt the message. For example, RSA.
- Polyalphabetic Substitution: A Polyalphabetic substitution cipher uses multiple alphabets for substitutions, which makes the technique resistent to frequency analysis.
I like to think of encoding as a form of “translation”. Different computer systems operate with different forms of encoding like different people use different languages. Just like languages have specific alphabets, encodings have alphabets of their own.
Base 16, 32, and 64⌗
The first “family” of encodings that I’ve seen frequently in CTF challenges are the base 16, 32, and 64 encoding schemes. Here’s IETF RFC 4648 if you want all the nitty gritty juicy details and also a long and usually dry read. I would recommend checking out the tables that describe the alphabets used for each type of encoding - knowing which alphabets correspond to which encoding schemes will help you identify the type of encoding at a glance!
Also, this wikipedia page lists some of the more obscure binary to text encoding types that are beyond the scope of this post. However, it’s good to know they exist: https://en.wikipedia.org/wiki/Binary-to-text_encoding
Base 16 (hexadecimal) encoding uses the hexadecimal number system (0123456789ABCDEF) to encode text. The base16 encoding of
Hey! This is an example of base16 encoding.
This is a tool you can use to encode and decode base16/hexadecimal: https://simplycalc.com/base16-encode.php
Some identifying characteristic of base16 encoding include the fact that it uses only hexadecimal characters and never needs padding (an equals sign at the end).
Base 32 is very similar to base16 encoding but it has a larger alphabet, and uses padding characters (equals signs). The base32 encoding of
Hey! This is an example of base32 encoding.
Some identifying characteristics of base32 encoding are the padding characters (equal signs) and the upper-case and numeric alphabet.
This is a tool you can use to encode and decode base32: https://simplycalc.com/base32-encode.php
Base 64 is similar to base32, but it has an even larger alphabet! It also uses padding characters. The base64 encoding of
Hey! This is an example of base64 encoding.
This is a tool you can use to encode and decode base64: https://simplycalc.com/base64-encode.php
The identifying features of base64 encoding are the upper and lower case alphabet, use of numbers, and message padding (equals signs at the end of the string).
URL Encoding (Percent-Encoding)⌗
URL Encoding is defined in IETF RFC 3986. Essentially, URL encoding is a standard used to encode specific data or characters in URLs.
The URL encoding of
Hey! This is an example of URL or Percent Encoding.
Here’s a tool for encoding and decoding URL or Percent Encoding: https://meyerweb.com/eric/tools/dencoder/
The identifying feature of URL encoding is the usage of percentage signs and some plaintext (although there is base64 and base32 URL encoding).
The wonders of hex, decimal, octal and ASCII⌗
When you hear ASCII, you probably think of ASCII art… But, it’s yet another form of encoding commonly encountered in CTF challenges! Esentially, it’s a mapping of octal, decimal, and hexadecimal numbers to corresponding characters: 5
This ASCII text can be represented using different number systems:
This is some ASCII text, and I like it very much.
01010100 01101000 01101001 01110011 00100000 01101001 01110011 00100000 01110011 01101111 01101101 01100101 00100000 01000001 01010011 01000011 01001001 01001001 00100000 01110100 01100101 01111000 01110100 00101100 00100000 01100001 01101110 01100100 00100000 01001001 00100000 01101100 01101001 01101011 01100101 00100000 01101001 01110100 00100000 01110110 01100101 01110010 01111001 00100000 01101101 01110101 01100011 01101000 00101110
124 150 151 163 040 151 163 040 163 157 155 145 040 101 123 103 111 111 040 164 145 170 164 054 040 141 156 144 040 111 040 154 151 153 145 040 151 164 040 166 145 162 171 040 155 165 143 150 056 012
84 104 105 115 32 105 115 32 115 111 109 101 32 65 83 67 73 73 32 116 101 120 116 44 32 97 110 100 32 73 32 108 105 107 101 32 105 116 32 118 101 114 121 32 109 117 99 104 46
54 68 69 73 20 69 73 20 73 6F 6D 65 20 41 53 43 49 49 20 74 65 78 74 2C 20 61 6E 64 20 49 20 6C 69 6B 65 20 69 74 20 76 65 72 79 20 6D 75 63 68 2E
Fortunately, you don’t have to use a lookup table, you can use tools to do all the hard work for you, once you’ve identified the encoding type and the number system:
Types of Ciphers - Symmetric (Single Key)⌗
There are two major categories of ciphers: symmetric (single key) and asymmetric (dual key). Asymmetric ciphers rely on a lot of math, so the focus of this section will be on symmetric ciphers. There are two subcategories within symmetric ciphers: substitution and transposition.
Substitutuion ciphers replace letters in the plaintext with other letters, numbers, symbols, etc.
Morse code is a substitution cipher originally designed for telegrams, it’s alphabet consists of dots, dashes and slashes.
This is some plaintext
- .... .. ... / .. ... / ... --- -- . / .--. .-.. .- .. -. - . -..- -
This substition is very straightforward: A=1, B=2, Z=26…
Nobody can crack this
14-15-2-15-4-25 3-1-14 3-18-1-3-11 20-8-9-19
The Caesarian Shift cipher, or Caesar cipher is a substitution method that involves rotating an alphabet by key n and substituting the rotated letters for the plaintext letters. The best visualization of how this works is a Caesar Cipher Wheel.
If n=11 then our alphabets are:
So A=L, B=M, etc.
Sample text for a Caesar Cipher.
Dlxawp epie qzc l Nlpdlc Ntaspc.
The decryption key is 26-n, so for this cipher the decryption key would be 15.
ROT13 is just a Caesar cipher with a key of 13. (Or n=13)
The Baconian cipher hides a message within a message. For example, two fonts (plain and bold) could be used in a sentence. The plain letters could be the “A” form and the bold letters coud be the “B” form. Ignoring the actual letters in the text and instead focusing on the different types of letters:
This is an example of a sentence that actually has a secret message.
BAAB AB AB AAABBBB AA BA A BAAABBAA BAAA BAAAAABA BAA A BAABAA BAABB AA BAABABABAAABBBBAABAABAAABBAABAAABAAAAABABAAABAABAABAABB
which decrypts to
Polyalphabetic Substitution Ciphers⌗
Polyalphabetic substitution ciphers utilize multiple alphabets when substituting letters, which makes them resistent to frequency analysis attacks. This class of ciphers uses keys to determine which alphabets are used when.
The Vigenere cipher is a keyed cipher that essentially re-orders rotated alphabets from the caesar cipher using a keyword. This website has a pretty good explanation and visualization tool!
Using the keyword “rainbow”
Some sample plaintext.
Jour toiglm cmoeetmku.
Note: Different tools implement this cipher in slightly different ways, so you might not get all of the plaintext depending on the tool you use.
The Hill Cipher is another polyalphabetic substitution cipher, and it is based in linear algebra. Its inner workings are very mathy, but the important part to understand is that they key is actually a matrix. In order to decrypt the Hill Cipher, there are three pieces of information you must know (or guess):
- The original alphabet used (A=1, B=2, etc.)
- The matrix size
- The matrix key values
Here are a couple good explanations of the Hill Cipher:
Transposition / Permutation⌗
Transposition or permutation ciphers manipulate and re-arrange the letters in the message instead of substituting different letters in their place. So, the original message is in front of you, but it’s just scrambled up!
The railfence cipher walks up and down “rails” to scramble letters. This key for this cipher is the number of rails. For example, we’ll replace spaces with asterisks and let our number of “rails” be 3 (n=3):
would be written like
s l e a p e t x m * t
Then, collapse each of the rows:
sle apetx m*t
And the completed ciphertext would be:
Here’s a longer example:
The quick brown fox jumped over the lazy dogs
Tqkofjevtl sh uc rw o updoe h aydgeibnxm rezo
Columnar transposition is kinda what it sounds like… You arrange your text in columns and then mix them around! The key for this cipher is a series of numbers that dictate the order of the columns, and you’ll need to know how many columns were used.
For example, the number of columns is 5, and our key is 23541:
Our friend good old sample text
can be arranged into 5 columns:
ourfr iendg oodol dsamp letex t
Applying the key:
23541 ourfr iendg oodol dsamp letex t
Re-arranging the columns:
12345 roufr giedn loood pdsma xleet t
Read down the columns and combine to get the final ciphertext:
To decrypt, to do the oppostite! Or use a tool…
Skip / Nth Character Extraction⌗
The skip cipher involves skipping a certain number of letters before “reading” a letter and adding it to the cipher text. Let’s say our key is 3, and our plaintext is:
First, write your plaintext out as many times as the size of your key (key is three, write it three times):
Then, extract every n letter (n=3 in our example):
SampleTextSampleTextSampleText S p T t m e x a l e
And the ciphertext becomes:
Encoding and Numeric Base Conversions:⌗
Text manipulation, processing, ciphers and encoding: https://gchq.github.io/CyberChef/
Cipher identification https://github.com/nccgroup/featherduster
This random website that I found in highschool⌗
Encryption/Decryption goldmine: http://rumkin.com/tools/cipher/
Another encryption/decryption goldmine: https://www.dcode.fr/tools-list#cryptography
Practical Cryptography has resources for learning to break classical ciphers (as opposed to just decrypting the message!) http://practicalcryptography.com/ciphers/
There are many, many more ciphers and encodings and resources, this is just a place to start! Now go crack some codes!
References / Sources⌗
Frequency Analsysis: https://www3.nd.edu/~busiforc/handouts/cryptography/cryptography%20hints.html ↩︎