Why Does the Kernel Hate Long Shebangs?
Here is a simple Hello world program written in Python. All good, given the sufficient permission, it should work fine as expected.
hello_world.py
1
2
3
#! /usr/bin/python3
print("Hello World!!!")
1
2
3
4
foo@bar:~$ chmod +x hello_world.py
foo@bar:~$ ./hello_world.py
Hello World!!!
And it does, what if I modify hello_world.py
to
1
2
3
4
#! /./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././usr/bin/python3
print("Hello World!!!")
Hmmm.., the shebang’s a bit longer. But if you scroll to the right, the specified path is same as that of the previous shebang /usr/bin/python3
. So, the shebang’s long, it shouldn’t matter much right ? Let’s give the required permission and try executing the file.
1
2
3
4
foo@bar:~$ chmod +x hello_world.py
foo@bar:~$ ./hello_world.py
zsh: ./shebangtest.py: bad interpreter: /././././././././././././././././: exec format error
It looks like the Kernel’s got some problem if the shebang is too long. Why ? How long can the shebang be ? Let’s see.
Shebang
Before going forward, let’s just refresh what a shebang is. A shebang is the first line in a script file that tells the operating system which interpreter to use to execute the script.
1
#! /usr/bin/python3
Here we are telling the OS to use Python as the interpreter to execute this script.
Why can’t we have long shebangs
To understand the reason, let’s get a rough idea of what happens when we execute the script using ./hello_world.py
.
When we execute a file, the kernel creates a struct
with some data about the file. Like the file name, arguments, etc. One of the keys in this struct is buf
which stores the first 256 bytes of the file to be executed.
After creating this struct
, the kernel iterates through different binary format handlers which checks buf
from the struct
to identify whether the particular file is supported by them. Script file with shebangs are supported by a format handler called binfmt_script.c
. It checks if the file begins with #!
and parses the interpreter from the line.
Remember how buf
contains only the first 256 bytes of the file. When we have a long shebang like the 2nd python script, the whole path isn’t visible to binfmt_script.c
and it tries to load the interpreter from the wrong path and ends up raising a “bad interpreter” error.
Till now we have been talking about the Linux kernel, where the size is defined as a constant BINPRM_BUF_SIZE
. By trying out different sizes of shebangs, I found out that the limit is 512 in MacOS.
The main reason for limiting the length of buf is for efficiency. And almost all the time, the first 256 bytes is enough to determine the respective binary format handler.
Conclusion
Long shebangs can cause issues because the kernel typically only reads the first 256 (or 512 in the case of macOS) bytes of the file to determine its format. This buffer size limitation ensures efficient handling of files without unnecessary overhead. When the shebang is too long, the kernel might not correctly identify the interpreter, leading to errors.”