Crypto? Never roll your own.

Author’s note: The purpose of this post is to provide an introduction to cryptography, ciphers, and encoding techniques commonly used in capture the flag (CTF) challenges. It’s the resource I would have wanted when I was approaching my first CTF cryptography challenges! I provide examples of ciphertext (or encoded text) to help the build intuition that will help with cipher recognition! In my opinion, that’s the hardest part of solving CTF crypto challenges!

Table of Contents:

Cryptography Concepts and Terms

I’ve found that Wikipedia has excellent articles on encoding and cryptographic systems, it’s a good place to look if you want more details on a specific encoding scheme or encryption algorithm.

  • Encoding: to convert (something, such as a body of information) from one system of communication into another 1
  • Cipher: an algorithm for performing encryption or decryption. 2
  • Plaintext: The unencrypted or “original” message
  • Ciphertext: The encrypted message (usually looks like gobbledegook)
  • Frequency Analysis: A statistical method for cracking ciphers. Essentially, it assumes that the most frequent letter in ciphertext will correspond with the most frequent letter in the plaintext language, or that the most common three-letter word in the ciphertext corresponds to “the”. 3
  • Key: a piece of information that specifies the transformation of plaintext into ciphertext, and vice versa for decryption algorithms. 4 Essentially, the key is part of the input into a cryptographic function that modifies the function’s operations while creating ciphertext in such a way that you have to have the key to get the plaintext from a decryption function.
  • Symmetric Cipher: The same key is used to encrypt and decrypt the message. For example, ROT13.
  • Asymmetric Cipher: Two distinct yet related keys (public and private) are used to encrypt and decrypt the message. For example, RSA.
  • Polyalphabetic Substitution: A Polyalphabetic substitution cipher uses multiple alphabets for substitutions, which makes the technique resistent to frequency analysis.

Encoding

I like to think of encoding as a form of “translation”. Different computer systems operate with different forms of encoding like different people use different languages. Just like languages have specific alphabets, encodings have alphabets of their own.

Base 16, 32, and 64

The first “family” of encodings that I’ve seen frequently in CTF challenges are the base 16, 32, and 64 encoding schemes. Here’s IETF RFC 4648 if you want all the nitty gritty juicy details and also a long and usually dry read. I would recommend checking out the tables that describe the alphabets used for each type of encoding - knowing which alphabets correspond to which encoding schemes will help you identify the type of encoding at a glance!

Also, this wikipedia page lists some of the more obscure binary to text encoding types that are beyond the scope of this post. However, it’s good to know they exist: https://en.wikipedia.org/wiki/Binary-to-text_encoding

Base 16 (hexadecimal) encoding uses the hexadecimal number system (0123456789ABCDEF) to encode text. The base16 encoding of

Hey! This is an example of base16 encoding.

is:

48657921205468697320697320616E206578616D706C65206F662062617365313620656E636F64696E672E

This is a tool you can use to encode and decode base16/hexadecimal: https://simplycalc.com/base16-encode.php

Some identifying characteristic of base16 encoding include the fact that it uses only hexadecimal characters and never needs padding (an equals sign at the end).


Base 32 is very similar to base16 encoding but it has a larger alphabet, and uses padding characters (equals signs). The base32 encoding of

Hey! This is an example of base32 encoding.

is:

JBSXSIJAKRUGS4ZANFZSAYLOEBSXQYLNOBWGKIDPMYQGEYLTMUZTEIDFNZRW6ZDJNZTS4===

Some identifying characteristics of base32 encoding are the padding characters (equal signs) and the upper-case and numeric alphabet.

This is a tool you can use to encode and decode base32: https://simplycalc.com/base32-encode.php


Base 64 is similar to base32, but it has an even larger alphabet! It also uses padding characters. The base64 encoding of

Hey! This is an example of base64 encoding.

is:

SGV5ISBUaGlzIGlzIGFuIGV4YW1wbGUgb2YgYmFzZTY0IGVuY29kaW5nLg==

This is a tool you can use to encode and decode base64: https://simplycalc.com/base64-encode.php

The identifying features of base64 encoding are the upper and lower case alphabet, use of numbers, and message padding (equals signs at the end of the string).


URL Encoding (Percent-Encoding)

URL Encoding is defined in IETF RFC 3986. Essentially, URL encoding is a standard used to encode specific data or characters in URLs.

The URL encoding of

Hey! This is an example of URL or Percent Encoding.

is:

Hey!%20This%20is%20an%20example%20of%20URL%20or%20Percent%20Encoding.%0A

Here’s a tool for encoding and decoding URL or Percent Encoding: https://meyerweb.com/eric/tools/dencoder/

The identifying feature of URL encoding is the usage of percentage signs and some plaintext (although there is base64 and base32 URL encoding).


The wonders of hex, decimal, octal and ASCII

When you hear ASCII, you probably think of ASCII art… But, it’s yet another form of encoding commonly encountered in CTF challenges! Esentially, it’s a mapping of octal, decimal, and hexadecimal numbers to corresponding characters: 5

Hello Friend

This ASCII text can be represented using different number systems:

This is some ASCII text, and I like it very much.

Binary:

01010100 01101000 01101001 01110011 00100000 01101001 01110011 00100000 01110011 01101111 01101101 01100101 00100000 01000001 01010011 01000011 01001001 01001001 00100000 01110100 01100101 01111000 01110100 00101100 00100000 01100001 01101110 01100100 00100000 01001001 00100000 01101100 01101001 01101011 01100101 00100000 01101001 01110100 00100000 01110110 01100101 01110010 01111001 00100000 01101101 01110101 01100011 01101000 00101110

Octal:

124 150 151 163 040 151 163 040 163 157 155 145 040 101 123 103 111 111 040 164 145 170 164 054 040 141 156 144 040 111 040 154 151 153 145 040 151 164 040 166 145 162 171 040 155 165 143 150 056 012

Decimal:

84 104 105 115 32 105 115 32 115 111 109 101 32 65 83 67 73 73 32 116 101 120 116 44 32 97 110 100 32 73 32 108 105 107 101 32 105 116 32 118 101 114 121 32 109 117 99 104 46

Hexadecimal:

54 68 69 73 20 69 73 20 73 6F 6D 65 20 41 53 43 49 49 20 74 65 78 74 2C 20 61 6E 64 20 49 20 6C 69 6B 65 20 69 74 20 76 65 72 79 20 6D 75 63 68 2E

Fortunately, you don’t have to use a lookup table, you can use tools to do all the hard work for you, once you’ve identified the encoding type and the number system:

https://www.rapidtables.com/convert/number/ascii-hex-bin-dec-converter.html

https://onlineasciitools.com/convert-ascii-to-octal

Types of Ciphers - Symmetric (Single Key)

There are two major categories of ciphers: symmetric (single key) and asymmetric (dual key). Asymmetric ciphers rely on a lot of math, so the focus of this section will be on symmetric ciphers. There are two subcategories within symmetric ciphers: substitution and transposition.

Substitution

Substitutuion ciphers replace letters in the plaintext with other letters, numbers, symbols, etc.

Morse

Morse code is a substitution cipher originally designed for telegrams, it’s alphabet consists of dots, dashes and slashes.

This is some plaintext

becomes

- .... .. ... / .. ... / ... --- -- . / .--. .-.. .- .. -. - . -..- -

http://rumkin.com/tools/cipher/morse.php


Letter Numbers

This substition is very straightforward: A=1, B=2, Z=26…

Nobody can crack this

becomes

14-15-2-15-4-25 3-1-14 3-18-1-3-11 20-8-9-19

http://rumkin.com/tools/cipher/numbers.php


Caesarian Shift

The Caesarian Shift cipher, or Caesar cipher is a substitution method that involves rotating an alphabet by key n and substituting the rotated letters for the plaintext letters. The best visualization of how this works is a Caesar Cipher Wheel.

If n=11 then our alphabets are:

ABCDEFGHIJKLMNOPQRSTUVWXYZ

LMNOPQRSTUVWXYZABCDEFGHIJK

So A=L, B=M, etc.

Sample text for a Caesar Cipher.

would become

Dlxawp epie qzc l Nlpdlc Ntaspc.

The decryption key is 26-n, so for this cipher the decryption key would be 15.

http://rumkin.com/tools/cipher/caesar.php


ROT13

ROT13 is just a Caesar cipher with a key of 13. (Or n=13)

http://rumkin.com/tools/cipher/rot13.php


Baconian

The Baconian cipher hides a message within a message. For example, two fonts (plain and bold) could be used in a sentence. The plain letters could be the “A” form and the bold letters coud be the “B” form. Ignoring the actual letters in the text and instead focusing on the different types of letters:

This is an example of a sentence that actually has a secret message.

BAAB AB AB AAABBBB AA BA A BAAABBAA BAAA BAAAAABA BAA A BAABAA BAABB AA
BAABABABAAABBBBAABAABAAABBAABAAABAAAAABABAAABAABAABAABB

which decrypts to

SUPERSECRET

http://rumkin.com/tools/cipher/baconian.php


Polyalphabetic Substitution Ciphers

Polyalphabetic substitution ciphers utilize multiple alphabets when substituting letters, which makes them resistent to frequency analysis attacks. This class of ciphers uses keys to determine which alphabets are used when.


Vigenere

The Vigenere cipher is a keyed cipher that essentially re-orders rotated alphabets from the caesar cipher using a keyword. This website has a pretty good explanation and visualization tool!

Using the keyword “rainbow”

Some sample plaintext.

becomes

Jour toiglm cmoeetmku.

Note: Different tools implement this cipher in slightly different ways, so you might not get all of the plaintext depending on the tool you use.

http://rumkin.com/tools/cipher/vigenere.php


Hill Cipher

The Hill Cipher is another polyalphabetic substitution cipher, and it is based in linear algebra. Its inner workings are very mathy, but the important part to understand is that they key is actually a matrix. In order to decrypt the Hill Cipher, there are three pieces of information you must know (or guess):

  • The original alphabet used (A=1, B=2, etc.)
  • The matrix size
  • The matrix key values

Here are a couple good explanations of the Hill Cipher:

https://www.dcode.fr/hill-cipher

https://www.geeksforgeeks.org/hill-cipher/


Transposition / Permutation

Transposition or permutation ciphers manipulate and re-arrange the letters in the message instead of substituting different letters in their place. So, the original message is in front of you, but it’s just scrambled up!

Railfence

The railfence cipher walks up and down “rails” to scramble letters. This key for this cipher is the number of rails. For example, we’ll replace spaces with asterisks and let our number of “rails” be 3 (n=3):

Sample text

would be written like

s       l       e
  a   p   e   t   x
    m       *       t

Then, collapse each of the rows:

sle
apetx
m*t

And the completed ciphertext would be:

sleapetxm*t

Here’s a longer example:

The quick brown fox jumped over the lazy dogs

becomes

Tqkofjevtl sh uc rw o updoe h aydgeibnxm rezo

Columnar Transposition

Columnar transposition is kinda what it sounds like… You arrange your text in columns and then mix them around! The key for this cipher is a series of numbers that dictate the order of the columns, and you’ll need to know how many columns were used.

For example, the number of columns is 5, and our key is 23541:

Our friend good old sample text

can be arranged into 5 columns:

ourfr
iendg
oodol
dsamp
letex
t

Applying the key:

23541
ourfr
iendg
oodol
dsamp
letex
t

Re-arranging the columns:

12345
roufr
giedn
loood
pdsma
xleet
 t

Read down the columns and combine to get the final ciphertext:

rglpxoiodltueosefdomerndat

To decrypt, to do the oppostite! Or use a tool…

http://rumkin.com/tools/cipher/coltrans.php


Skip / Nth Character Extraction

The skip cipher involves skipping a certain number of letters before “reading” a letter and adding it to the cipher text. Let’s say our key is 3, and our plaintext is:

SampleText

First, write your plaintext out as many times as the size of your key (key is three, write it three times):

SampleTextSampleTextSampleText

Then, extract every n letter (n=3 in our example):

SampleTextSampleTextSampleText
S  p  T  t  m  e  x  a  l  e

And the ciphertext becomes:

SpTtmexale

https://www.dcode.fr/skip-cipher


Tools

Encoding and Numeric Base Conversions:

https://simplycalc.com/index.php https://www.rapidtables.com/convert/number/ascii-hex-bin-dec-converter.html

CyberChef

Text manipulation, processing, ciphers and encoding: https://gchq.github.io/CyberChef/

FeatherDuster

Cipher identification https://github.com/nccgroup/featherduster

This random website that I found in highschool

Encryption/Decryption goldmine: http://rumkin.com/tools/cipher/

dCode

Another encryption/decryption goldmine: https://www.dcode.fr/tools-list#cryptography

Practical Cryptography

Practical Cryptography has resources for learning to break classical ciphers (as opposed to just decrypting the message!) http://practicalcryptography.com/ciphers/

There are many, many more ciphers and encodings and resources, this is just a place to start! Now go crack some codes!

References / Sources


  1. Encoding Definition: https://www.merriam-webster.com/dictionary/encode ↩︎

  2. Cipher Definition: https://en.wikipedia.org/wiki/Cipher ↩︎

  3. Frequency Analsysis: https://www3.nd.edu/~busiforc/handouts/cryptography/cryptography%20hints.html ↩︎

  4. Cryptographic Key: https://en.wikipedia.org/wiki/Key_(cryptography) ↩︎

  5. ASCII Table: http://www.asciitable.com/ ↩︎