* If you don’t take whitespace into account.
My friend challenged me to find the shortest solution to a certain Leetcode-style problem in Python. They were generous enough to let me use whitespace for free, so that the code stays readable. So that’s exactly what we’ll abuse to encode any Python program in
BitsWe can encode arbitrary data in a string by only using whitespace. For example, we could encode 0
bits as spaces and 1
bits as tabs. Now you just have to decode this.
As you start implementing the decoder, it immediately becomes clear that this approach requires about 50 characters at minimum. You can use c % 2 for c in b"..."
to extract individual bits, then you need to merge bits by using str
and concatenating then with "".join(...)
, then you to parse the bits with int.to_bytes(...)
, and finally call exec
. We need to find another solution.
CharactersWhat if we didn’t go from characters to bits and then back? What if instead, we mapped each whitespace character to its own non-whitespace character and then evaluated that?
exec(
"[whitespace...]"
.replace(" ", "A")
.replace("\t", "B")
.replace("\v", "C")
.replace("\f", "D")
...
)
Unicode has quite a lot of whitespace characters, so this should be possible, in theory. Unfortunately, this takes even more bytes in practice. Under 50 characters, we can fit just two replace
calls:
exec("[whitespace...]".replace(" ","A").replace("\t","B"))
But we don’t have to use replace
! The less-known str.translate
method can perform multiple single-character replaces at once:
>>> "Hello, world!".translate({ord("H"): "h", ord("!"): "."})
'hello, world.'
The following fits in 50 characters:
exec("[whitespace...]".translate({9: "A", 11: "B", 12: "C", 28: "D"})
4 characters isn’t much to work with, but here’s some good news: translate
takes anything indexable with integers (code points). We can thus replace the dict with a string:
exec(
"[whitespace...]".translate(
" A BC DEFGH I J"
)
)
The characters ABCDEFGHIJ
are located at indices
exec("".translate("ABCDEFGHIJ"))
AlphabetWe can now encode any Python program that uses at most
This would be more than enough for JavaScript: JSFuck can transform any JS program to an equivalent JS program that only uses characters from the 6-character set []()!+
. Does anything like this exist for Python?
Actually, it does! PyFuck transforms any Python script to an equivalent script that uses only 8 characters: exc('%0)
. This means that we can reduce our
exec("[whitespace...]".translate(" e xc ('%0)"))
That’s
OptimizationThere’s just one problem: the output of PyFuck is exponential in the count of non-exc(0)
characters in the input code. So to encode realistic programs with just exc('%0)
, we need to pass code through a nested encoder before passing it to PyFuck. The optimized nested code looks like this:
exec(int("[bits of code]".replace("(","0").replace(")","1"),2).to_bytes([length of code]))
We store bits as (
and )
, so there’s only a fixed cost due to PyFuck (about
The endSo that’s how you print Lorem Ipsum in only
Hope you found this entertaining! If anyone knows how to bring this to