* If you don’t take whitespace into account.
My friend challenged me to find the shortest solution to a certain Leetcode-style problem in Python. They were generous enough to let me use whitespace for free, so that the code stays readable. So that’s exactly what we’ll abuse to encode any Python program in
This post originally stated that
characters are always enough. Since then, commandz and xnor from the Code Golf Discord server have devised a better solution, reaching bytes. After a few minor modifications, it satisfies the requirements of this problem, so I publish it here too.
BitsWe can encode arbitrary data in a string by only using whitespace. For example, we could encode 0
bits as spaces and 1
bits as tabs. Now you just have to decode this.
As you start implementing the decoder, it immediately becomes clear that this approach requires about 50 characters at minimum. You can use c % 2 for c in b"..."
to extract individual bits, then you need to merge bits by using str
and concatenating then with "".join(...)
, then you to parse the bits with int.to_bytes(...)
, and finally call exec
. We need to find another solution.
CharactersWhat if we didn’t go from characters to bits and then back? What if instead, we mapped each whitespace character to its own non-whitespace character and then evaluated that?
exec(
"[whitespace...]"
.replace(" ", "A")
.replace("\t", "B")
.replace("\v", "C")
.replace("\f", "D")
...
)
Unicode has quite a lot of whitespace characters, so this should be possible, in theory. Unfortunately, this takes even more bytes in practice. Under 50 characters, we can fit just two replace
calls:
exec("[whitespace...]".replace(" ","A").replace("\t","B"))
But we don’t have to use replace
! The less-known str.translate
method can perform multiple single-character replaces at once:
>>> "Hello, world!".translate({ord("H"): "h", ord("!"): "."})
'hello, world.'
The following fits in 50 characters:
exec("[whitespace...]".translate({9: "A", 11: "B", 12: "C", 28: "D"})
4 characters isn’t much to work with, but here’s some good news: translate
takes anything indexable with integers (code points). We can thus replace the dict with a string:
exec(
"[whitespace...]".translate(
" A BC DEFGH I J"
)
)
The characters ABCDEFGHIJ
are located at indices
exec("".translate("ABCDEFGHIJ"))
We can now encode any Python program that uses at most exc('%0)
. This reduces the code size to
A better wayBut it turns out there’s another way to translate whitespace to non-whitespace.
This solution was found by readers of my blog – thanks!
When repr
is applied to Unicode strings, it replaces the Unicode codepoints with their \uXXXX
representations. For example, U+2001 Em Quad
is encoded as '\u2001'
. All in all, Unicode whitespace gives us unlimited supply of \
, x
, and the whole hexadecimal alphabet (plus two instances of '
).
Say we wanted to extract the least significant digits of characters from U+2000
to U+2007
. Here’s how to do this:
# Imagine these \uXXXX escapes are literal whitespace characters
>>> repr("\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007")[6::6]
'01234567'
To get \
, x
, and the rest of the hexadecimal alphabet, we need characters like U+000B
and U+001F
. We also need to align the strings exactly, so that one of the columns contains all the alphabet:
v
\: " \t "
x: " \x0b"
0: "\u2000 "
1: "\u2001 "
2: "\u2002 "
3: "\u2003 "
4: "\u2004 "
5: "\u2005 "
6: "\u2006 "
7: "\u2007 "
8: "\u2008 "
9: "\u2009 "
a: "\u200a "
b: " \x0b "
c: " \x0c "
d: " \x1d "
e: " \x1e "
f: " \x1f "
^
This requires us to increase the step to
Now, if we have free access to \
, x
, and the hexadecimal alphabet, we can reduce any program to just exec
is free):
# print("Hello, world!")
exec('\x70\x72\x69\x6e\x74\x28\x22\x48\x65\x6c\x6c\x6f\x2c\x20\x77\x6f\x72\x6c\x64\x21\x22\x29')
Now we can encode this using the previous trick, leaving ('')
as-is, and run it:
exec(repr("[encoding of exec]([padding]'[user code]'[padding])")[6::8])
The endSo that’s how you print Lorem Ipsum in only
Hope you found this entertaining! If anyone knows how to bring this to