reverse.put.as
Open in
urlscan Pro
116.203.155.49
Public Scan
Submitted URL: https://t.co/cs1GMhqunI
Effective URL: https://reverse.put.as/2021/12/17/knock-knock-whos-there/
Submission: On December 28 via manual from US — Scanned from DE
Effective URL: https://reverse.put.as/2021/12/17/knock-knock-whos-there/
Submission: On December 28 via manual from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Toggle navigation * Home * About * Archives * Crackmes * gdbinit * Papers * Patches * Categories * Tags KNOCK KNOCK! WHO'S THERE? - AN NSA VM -------------------------------------------------------------------------------- Posted on December 17, 2021 - @reversing @malware Back in 2017 (feels like ages ago) I decided to take a peek into the ShadowBrokers leaks and reverse some of the tools. I started on dewdrop simply because it had a macOS version. I made local presentations at 0xOpoSec and BSidesLisbon but those slides were never published for obvious reasons (aka live implants all over the Internet). Significant time has passed and everyone went crazy last week with the beautiful NSO exploit VM published by Project Zero, so why not ride the wave and present a simple NSA BPF VM. It is still an interesting work and you have to admire the great engineering that goes behind this code. It’s not everyday that you can take a peek at code developed by a well funded state actor. This post is only going to focus on the BPF part of the implant so you will have to fill in the blanks about everything else. So let’s start! After we start reversing the dewdrop binary we reach a point where we understand that a libpcap sniffer is installed and we are dealing with a port knocking backdoor with multiple targets such as Solaris, Linux, FreeBSD, HP-UX, JunOS, OS X. If it’s connected to the Internet there is a backdoor version. SIGINT all the things! FX created the first (AFAIK!) port knocking backdoor, cd00r.c. Knock is a another example. Port knocking removes the need for port listening backdoors and hiding those listeners from traditional network tools (netstat and friends). It’s the year 2000, rootkits are all the rage! Holy crap, 21 years already? Port knocking is essentially implemented as a custom sniffer looking for a magic packet (or a group of magic packets) and do something when it matches such as open a port, make a callback to a remote host, etc. Without the right sequence and/or data you can’t get in and detect it from the network scans. Quoting cd00r.c: > The approach of cd00r.c is to provide remote access to the system without > showing an open port all the time. This is done by using a sniffer on the > specified interface to capture all kinds of packets. The sniffer is not > running in promiscuous mode to prevent a kernel message in syslog and > detection by programs like AnitSniff. Libpcap has all the necessary features to implement this. We just need to install a listener and activate a BPF based filter since we don’t need to capture everything (it’s the 2000s, CPUs are slow, we don’t want to raise alarms). When the magic packet or sequence is triggered a callback is executed and we can do whatever we want. The following is the observed backdoor process diagram: The initial process just forks and exit as a debugging measure (gdb used to have problems following child processes - software breakpoints crash because no debugger installed in the child). The first fork is the daemon and watchdog process responsible for managing the rest of the code. It then forks a child worker where the port knocking sniffer is installed. The watchdog process is also very simple. It redirects all output to /dev/null, removes all signal handlers, forks a worker child and waits for it. Core files are disabled, file limits are increased, and a signal handler to kill the child is set, to avoid zombie processes if the watchdog is killed. Too much gore in Unix, uh? Strings are XOR obfuscated. You can find a Unicorn Engine based deobfuscation utility here. At the time I was playing a lot with Unicorn so it was just faster to copy the shellcode into a quick Unicorn emulator. Recently I used a better approach with Lamberts obfuscation with the delambert IDA plugin. The same obfuscation algorithm is reused in other tools. Ooops, code reusage sin! After we understand the process diagram it is clear that we want to debug the worker child. My trick at the time was to patch a sleep (or just use Trammell’s simple and powerful infinite loop trick, which takes less bytes and doesn’t need offset computations) in the code that the child is going to run, attach the debugger, restore (or emulate) the original bytes, rewind EIP and have fun. Or, you can just skip the fork, and patch code or set EIP to the code that would be executed by the child. Not sure why I originally used a sleep patch instead of the infinite loop. Maybe because it would look nicer on slides or to show how to call imported symbols. Go figure! I can’t remember how but I understood that libpcap was used. All the library original strings are removed to avoid (easy) library identification. I think at the time I just compiled a libpcap version and manually identified the most interesting functions. Bindiff or Diaphora are the tools of the trade for this kind of task. Usually the flow to install a libpcap based sniffer is: * Locate the network interfaces to sniff at. * Compile a filter. * Install the filter. * Start sniffing. * Get matching packets into a callback. * Process the matched packets. * Do something (evil). Just read cd00r.c code and you will understand what’s going in the backdoor code. In this sample the clear text BPF filter is missing and a call to pcap_compile to compile that filter. Instead of compiled on the fly, the (compiled) bytecode is embedded in the binary. We can find the code that retrieves and installs the bytecode here: __text:0x00000100001882 E8 59 23 00 00 call sub_100003BE0 ; retrieve the pre compiled bpf_program __text:0x00000100001887 4C 89 E7 mov rdi, r12 __text:0x0000010000188A 48 89 C6 mov rsi, rax ; rax = 0x000000010000A640 __text:0x0000010000188A ; __text:0x0000010000188A ; struct bpf_program { __text:0x0000010000188A ; u_int bf_len; __text:0x0000010000188A ; struct bpf_insn *bf_insns; __text:0x0000010000188A ; }; __text:0x0000010000188A ; __text:0x0000010000188A ; struct bpf_insn { __text:0x0000010000188A ; u_short code; __text:0x0000010000188A ; u_char jt; __text:0x0000010000188A ; u_char jf; __text:0x0000010000188A ; bpf_u_int32 k; __text:0x0000010000188A ; }; __text:0x0000010000188D E8 9E 51 00 00 call pcap_setfilter ; int __text:0x0000010000188D ; pcap_setfilter(pcap_t *p, struct bpf_program *fp) The input to pcap_setfilter is a bpf program: > pcap_setfilter() is used to specify a filter program. fp is a pointer to a > bpf_program struct, usually the result of a call to pcap_compile(3PCAP). If we take a look at the referenced data it clearly looks like a potential bpf_program: __data:0x0000010000A640 39 00 00 00 dword_10000A640 dd 39h ; DATA XREF: sub_100003BE0+62↑o __data:0x0000010000A644 00 db 0 __data:0x0000010000A645 00 db 0 __data:0x0000010000A646 00 db 0 __data:0x0000010000A647 00 db 0 __data:0x0000010000A648 60 A6 00 00 01 00 00 00 off_10000A648 dq offset port_knocking_bpf_program __data:0x0000010000A650 00 00 00 00 00 00 00 00+ align 20h __data:0x0000010000A660 30 port_knocking_bpf_program db 30h ; DATA XREF: __data:off_10000A648↑o __data:0x0000010000A661 00 db 0 __data:0x0000010000A662 00 db 0 __data:0x0000010000A663 00 db 0 __data:0x0000010000A664 0E db 0Eh __data:0x0000010000A665 00 db 0 __data:0x0000010000A666 00 db 0 __data:0x0000010000A667 00 db 0 __data:0x0000010000A668 15 db 15h __data:0x0000010000A669 00 db 0 __data:0x0000010000A66A 01 db 1 __data:0x0000010000A66B 00 db 0 __data:0x0000010000A66C 45 db 45h ; E We can define the following types in IDA to transform the data into an array: struct bpf_insn { u_short code; u_char jt; u_char jf; int32 k; }; struct bpf_program { u_int bf_len; struct bpf_insn *bf_insns; }; Applying the new types to the data we can see the expected bpf_program structure: __data:0x0000010000A640 stru_10000A640 bpf_program <39h, 0, offset port_knocking_bpf_program> __data:0x0000010000A640 ; DATA XREF: sub_100003BE0+62↑o __data:0x0000010000A640 ; sub_100003BE0+4↑r ... __data:0x0000010000A650 align 20h __data:0x0000010000A660 port_knocking_bpf_program bpf_insn < 30h, 0, 0, 0Eh> __data:0x0000010000A660 ; DATA XREF: __data:stru_10000A640↑o __data:0x0000010000A660 bpf_insn < 15h, 1, 0, 45h> __data:0x0000010000A660 bpf_insn < 6, 0, 0, 0> __data:0x0000010000A660 bpf_insn < 28h, 0, 0, 10h> __data:0x0000010000A660 bpf_insn < 4, 0, 0, 0Eh> __data:0x0000010000A660 bpf_insn < 35h, 1, 0, 0AAh> __data:0x0000010000A660 bpf_insn < 6, 0, 0, 0> __data:0x0000010000A660 bpf_insn < 2, 0, 0, 1> __data:0x0000010000A660 bpf_insn < 14h, 0, 0, 6> __data:0x0000010000A660 bpf_insn < 7, 0, 0, 0> __data:0x0000010000A660 bpf_insn < 48h, 0, 0, 0> __data:0x0000010000A660 bpf_insn < 44h, 0, 0, 0E6CFh> (...) The bpf_program header tells us the program has 57 instructions. We definitely want to disassemble and reverse it. Before the BPF program is retrieved, the network frame header size is computed: __text:0x00000100001877 mov rdi, r12 ; pcap_t * __text:0x0000010000187A call find_frame_header_size Quite a few cases are supported but the virtual machine executing the sample uses Ethernet so we will go through the DLT_EN10MB case. __text:0x00000100001550 find_frame_header_size proc near ; CODE XREF: sub_1000015F0+A8↓p __text:0x00000100001550 ; sub_100001720+15A↓p __text:0x00000100001550 ; __unwind { __text:0x00000100001550 push rbp __text:0x00000100001551 mov rbp, rsp __text:0x00000100001554 call pcap_datalink ; int __text:0x00000100001554 ; pcap_datalink(pcap_t *p) __text:0x00000100001554 ; { __text:0x00000100001554 ; return (p->linktype); __text:0x00000100001554 ; } __text:0x00000100001559 cmp eax, 6Bh ; 'k' ; DLT_FRELAY __text:0x0000010000155C jg short loc_100001574 __text:0x0000010000155E cmp eax, 1 ; DLT_EN10MB __text:0x00000100001561 jz short loc_100001585 __text:0x00000100001563 cmp eax, 8 ; DLT_SLIP __text:0x00000100001566 jz short loc_10000158C __text:0x00000100001568 cmp eax, 0Ch ; DLT_RAW __text:0x0000010000156B jz short loc_100001593 __text:0x0000010000156D __text:0x0000010000156D loc_10000156D: ; CODE XREF: find_frame_header_size+2C↓j __text:0x0000010000156D mov eax, 0FFFFh __text:0x00000100001572 pop rbp __text:0x00000100001573 retn __text:0x00000100001574 ; -------------------------------------------------------------------- __text:0x00000100001574 __text:0x00000100001574 loc_100001574: ; CODE XREF: find_frame_header_size+C↑j __text:0x00000100001574 cmp eax, 71h ; 'q' ; DLT_LINUX_SLL __text:0x00000100001577 jz short loc_10000158C __text:0x00000100001579 cmp eax, 6Ch ; 'l' ; DLT_LOOP __text:0x0000010000157C jnz short loc_10000156D __text:0x0000010000157E mov eax, 4 __text:0x00000100001583 pop rbp __text:0x00000100001584 retn __text:0x00000100001585 ; -------------------------------------------------------------------- __text:0x00000100001585 __text:0x00000100001585 loc_100001585: ; CODE XREF: find_frame_header_size+11↑j __text:0x00000100001585 mov eax, 0Eh ; /* Ethernet header length */ __text:0x0000010000158A pop rbp __text:0x0000010000158B retn __text:0x0000010000158C ; -------------------------------------------------------------------- __text:0x0000010000158C __text:0x0000010000158C loc_10000158C: ; CODE XREF: find_frame_header_size+16↑j __text:0x0000010000158C ; find_frame_header_size+27↑j __text:0x0000010000158C mov eax, 10h __text:0x00000100001591 pop rbp __text:0x00000100001592 retn __text:0x00000100001593 ; -------------------------------------------------------------------- __text:0x00000100001593 __text:0x00000100001593 loc_100001593: ; CODE XREF: find_frame_header_size+1B↑j __text:0x00000100001593 xor eax, eax __text:0x00000100001595 pop rbp __text:0x00000100001596 retn __text:0x00000100001596 ; } // starts at 100001550 __text:0x00000100001596 find_frame_header_size endp The frame header size is necessary because the bpf program is modified to support those different link types (old stuff, uh?). This is what the function that returns the bpf program does: __text:0x00000100003BE0 sub_100003BE0 proc near ; CODE XREF: sub_100001720+162↑p __text:0x00000100003BE0 ; __unwind { __text:0x00000100003BE0 push rbp __text:0x00000100003BE1 mov rbp, rsp __text:0x00000100003BE4 mov rax, cs:stru_10000A640.bf_insns __text:0x00000100003BEB mov [rax+4], edi __text:0x00000100003BEE lea eax, [rdi+2] __text:0x00000100003BF1 mov rcx, cs:stru_10000A640.bf_insns __text:0x00000100003BF8 mov [rcx+1Ch], eax __text:0x00000100003BFB mov rax, cs:stru_10000A640.bf_insns __text:0x00000100003C02 mov [rax+24h], edi __text:0x00000100003C05 lea eax, [rdi+9Ch] __text:0x00000100003C0B mov rcx, cs:stru_10000A640.bf_insns __text:0x00000100003C12 mov [rcx+2Ch], eax __text:0x00000100003C15 lea eax, [rdi+9] __text:0x00000100003C18 mov rcx, cs:stru_10000A640.bf_insns __text:0x00000100003C1F mov [rcx+0E4h], eax __text:0x00000100003C25 lea eax, [rdi+20h] __text:0x00000100003C28 mov rcx, cs:stru_10000A640.bf_insns __text:0x00000100003C2F mov [rcx+0F4h], eax __text:0x00000100003C35 mov rax, cs:stru_10000A640.bf_insns __text:0x00000100003C3C mov [rax+11Ch], edi __text:0x00000100003C42 lea rax, stru_10000A640 ; bpf_program size __text:0x00000100003C49 pop rbp __text:0x00000100003C4A retn Or with decompiler assistance (IDA subscriptions model, lol…): bpf_program *__fastcall sub_100003BE0(u_int32 frame_size) { stru_10000A640.bf_insns->k = frame_size; stru_10000A640.bf_insns[3].k = frame_size + 2; stru_10000A640.bf_insns[4].k = frame_size; stru_10000A640.bf_insns[5].k = frame_size + 156; stru_10000A640.bf_insns[28].k = frame_size + 9; stru_10000A640.bf_insns[30].k = frame_size + 32; stru_10000A640.bf_insns[35].k = frame_size; return &stru_10000A640; } Recalling the struct bpf_insn definition: struct bpf_insn { u_short code; // Instruction type and addressing mode u_char jt; // Jump if false u_char jf; // Jump if true int32 k; // Generid field used for various purposes }; That function is just adapting the bpf program to the different types of data links and fixing offsets. The default program is defined for Ethernet, so the values will be the same as the original, meaning that we can dump directly the program from the binary. The following table comparing the original and computed values to modify proves this: __data:0x0000010000A660 bpf_insn < 30h, 0, 0, 0Eh> <- 0: 0xE = 0xE __data:0x0000010000A660 bpf_insn < 28h, 0, 0, 10h> <- 3: 0xE + 2 = 0x10 __data:0x0000010000A660 bpf_insn < 4, 0, 0, 0Eh> <- 4: 0xE = 0xE __data:0x0000010000A660 bpf_insn < 35h, 1, 0, AAh> <- 5: 0xE + 156 = 0xAA __data:0x0000010000A660 bpf_insn < 30h, 0, 0, 17h> <- 28: 0xE + 9 = 0x17 __data:0x0000010000A660 bpf_insn < 30h, 0, 0, 2Eh> <- 30: 0xE + 32 = 0x2E __data:0x0000010000A660 bpf_insn < 48h, 0, 0, 0Eh> <- 35: 0xE = 0xE To disassemble the bytecode we can use bpftools published by Cloudflare. This is based out of the available Linux kernel tools in kernel source code. This repo compiles easily while I think I had some issues compiling the kernel version. Just install the dependencies and compile (Linux only AFAIK). The bpf debugger and disassembler can be found at linux_tools/bpf_dbg. If you feel brave (and lucky) you can always try radare2 (eh eh eh eh). To load the bytecode we need to convert it to this tool format. Example: load bpf 12,40 0 0 12,21 0 5 34525,48 0 0 20,21 6 0 17,21 0 6 44,48 0 0 54,21 3 4 17,21 0 3 2048,48 0 0 23,21 0 1 17,6 0 0 262144,6 0 0 0 The first field is the number of instructions, followed by each instruction in base 10 and space separated fields. I created bpf_dbg_output to convert the instructions array to bpf_dbg input format. We can finally disassemble the bpf payload: > load bpf 57,48 0 0 14,21 1 0 69,6 0 0 0,40 0 0 16,4 0 0 14,53 1 0 170,6 0 0 0,2 0 0 1,20 0 0 6,7 0 0 0,72 0 0 0,68 0 0 59087,2 0 0 15,72 0 0 0,84 0 0 59087,132 0 0 0,20 0 0 1,7 0 0 0,96 0 0 15,92 0 0 0,7 0 0 0,2 0 0 2,96 0 0 1,28 0 0 0,7 0 0 0,72 0 0 0,2 0 0 3,97 0 0 2,48 0 0 23,21 0 5 6,48 0 0 46,116 0 0 2,20 0 0 20,12 0 0 0,7 0 0 0,72 0 0 14,2 0 0 4,96 0 0 1,20 0 0 2,7 0 0 0,72 0 0 0,68 0 0 40298,2 0 0 15,72 0 0 0,84 0 0 40298,132 0 0 0,20 0 0 1,7 0 0 0,96 0 0 15,92 0 0 0,7 0 0 0,96 0 0 4,29 2 0 0,96 0 0 3,29 0 1 0,6 0 0 65535,6 0 0 0 > disassemble l0: ldb [14] l1: jeq #0x45, l3, l2 l2: ret #0 l3: ldh [16] l4: add #14 l5: jge #0xaa, l7, l6 l6: ret #0 l7: st M[1] l8: sub #6 l9: tax l10: ldh [x+0] l11: or #0xe6cf l12: st M[15] l13: ldh [x+0] l14: and #0xe6cf l15: neg l16: sub #1 l17: tax l18: ld M[15] l19: and x l20: tax l21: st M[2] l22: ld M[1] l23: sub x l24: tax l25: ldh [x+0] l26: st M[3] l27: ldx M[2] l28: ldb [23] l29: jeq #0x6, l30, l35 l30: ldb [46] l31: rsh #2 l32: sub #20 l33: add x l34: tax l35: ldh [x+14] l36: st M[4] l37: ld M[1] l38: sub #2 l39: tax l40: ldh [x+0] l41: or #0x9d6a l42: st M[15] l43: ldh [x+0] l44: and #0x9d6a l45: neg l46: sub #1 l47: tax l48: ld M[15] l49: and x l50: tax l51: ld M[4] l52: jeq x, l55, l53 l53: ld M[3] l54: jeq x, l55, l56 l55: ret #0xffff l56: ret #0 To understand what is going on here we need the bpf instruction set: Instruction Addressing mode Description ld 1, 2, 3, 4, 10 Load word into A ldi 4 Load word into A ldh 1, 2 Load half-word into A ldb 1, 2 Load byte into A ldx 3, 4, 5, 10 Load word into X ldxi 4 Load word into X ldxb 5 Load byte into X st 3 Store A into M[] stx 3 Store X into M[] jmp 6 Jump to label ja 6 Jump to label jeq 7, 8 Jump on A == k jneq 8 Jump on A != k jne 8 Jump on A != k jlt 8 Jump on A < k jle 8 Jump on A <= k jgt 7, 8 Jump on A > k jge 7, 8 Jump on A >= k jset 7, 8 Jump on A & k add 0, 4 A + <x> sub 0, 4 A - <x> mul 0, 4 A * <x> div 0, 4 A / <x> mod 0, 4 A % <x> neg !A and 0, 4 A & <x> or 0, 4 A | <x> xor 0, 4 A ^ <x> lsh 0, 4 A << <x> rsh 0, 4 A >> <x> tax Copy A into X txa Copy X into A ret 4, 9 Return The instruction set consists of load, store, branch, alu, miscellaneous and return instructions. Documentation from Linux kernel. The following registers are available: * A : 32 bit wide accumulator * X : 32 bit wide X register * M[] : 16 x 32 bit wide misc registers aka “scratch memory store”, addressable from 0 to 15 And finally the addressing modes: Addressing mode Syntax Description 0 x/%x Register X 1 [k] BHW at byte offset k in the packet 2 [x + k] BHW at the offset X + k in the packet 3 M[k] Word at offset k in M[] 4 #k Literal value stored in k 5 4*([k]&0xf) Lower nibble * 4 at byte offset k in the packet 6 L Jump label L 7 #k,Lt,Lf Jump to Lt if true, otherwise jump to Lf 8 x/%x,Lt,Lf Jump to Lt if true, otherwise jump to Lf 9 #k,Lt Jump to Lt if predicate is true 10 x/%x,Lt Jump to Lt if predicate is true 11 a/%a Accumulator A 12 extension BPF extension Note the relevant sizes definitions used here: * Half word: 2 bytes * Word: 4 bytes The following valid ICMP packet (produced by their port knocking tool) is used to reverse the bpf program: s10:40:09.498560 IP (tos 0x0, ttl 64, id 1, offset 0, flags [none], proto ICMP (1), length 164) 192.168.30.14 > 192.168.30.15: ICMP echo request, id 24374, seq 25107, length 144 0x0000: 000c 29db 3ac4 000c 29b6 b3c6 0800 4500 ..).:...).....E. 0x0010: 00a4 0001 0000 4001 bcea c0a8 1e0e c0a8 ......@......... 0x0020: 1e0f 0800 9e57 5f36 6213 7e29 5000 1743 .....W_6b.~)P..C 0x0030: 5b50 ab0d addf b955 1089 578f 849b ecdf [P.....U..W..... 0x0040: 83fd 84c0 a779 7118 43ac ec65 1249 7e5f .....yq.C..e.I~_ 0x0050: e7ea 2b2a 6265 a1d3 912f 2dd8 b3ab e30a ..+*be.../-..... 0x0060: 0d3b e0f4 e527 3955 9f44 46d7 0608 7703 .;...'9U.DF...w. 0x0070: f134 5138 4845 68bd 7382 d1c4 1fcb adf5 .4Q8HEh.s....... 0x0080: c2ae 87b4 ac48 b398 5f65 24d3 7090 6c04 .....H.._e$.p.l. 0x0090: fc1a cb5b 99b5 ec76 a129 596e edb3 668b ...[...v.)Yn..f. 0x00a0: 8848 9bce f31c 458e 07b5 52c7 e647 7e0f .H....E...R..G~. 0x00b0: e343 .C The reversed and commented version of the program (modified instructions are 0, 3, 4, 5, 28, 30, 35): > disassemble l0: ldb [14] <- load byte from offset 14 (aka skip frame header) into A. A = 0xE l1: jeq #0x45, l3, l2 <- check if it's 0x45 - IP packet, header length = 5, version 4 l2: ret #0 <- exit if not a IPv4 packet l3: ldh [16] <- load IP packet total length into A: 0xA4 = 164 bytes. A = 0xA4 l4: add #14 <- add link layer header size to A. A = 0xA4 + 0xE = 0xB2 (178) l5: jge #0xaa, l7, l6 <- full packet must be at least 170 bytes. A = 178 bytes l6: ret #0 <- exit if packet too short l7: st M[1] <- store packet length into misc register 1. M[1] = 178 l8: sub #6 <- A = packet_length - 6 = offset 172. A = 172 l9: tax <- copy A to X register. X = 0xAC l10: ldh [x+0] <- load half word value from packet offset 0xAC. A = 0xE647 l11: or #0xe6cf <- 0xE647 | 0xE6CF. A = 0xE6CF l12: st M[15] <- store result at M[15]. M[15] = 0xE6CF l13: ldh [x+0] <- load half word value from packet offset 0xAC. A = 0xE647 l14: and #0xe6cf <- 0xE647 & 0xE6CF. A = 0xE647 l15: neg <- A = ~0xE647 + 1 = 0x19B9 l16: sub #1 <- A = 0x19B8 l17: tax <- copy A to X register. X = 0x19B8 l18: ld M[15] <- load from M[15]. A = 0xE6CF l19: and x <- 0xE6CF & 0x19B8. A = 0x88 (payload size) <- ((0xE647 | 0xE6CF) & (~(0xE647 & 0xE6CF) + 1) - 1) = 0x88 <- this is just 0xE647 ^ 0xE6CF <- because there is no XOR operation in this VM <- this just extracts the payload size, which is fixed at 136 bytes l20: tax <- X = 0x88 (payload data size) l21: st M[2] <- M[2] = 0x88 (136) l22: ld M[1] <- A = 178 (0xB2) l23: sub x <- A = 0xB2 - 0x88 = 42 (0x2A) l24: tax <- X = 42 (size of all headers up to the payload). 178 - 136 = 42 l25: ldh [x+0] <- 42 is the ICMP data offset in the packet - so all this just computes where the data starts - they call it the trigger - loads the data at packet offset 0x42. A = 0x7E29 l26: st M[3] <- M[3] = 0x7E29 l27: ldx M[2] <- X = 0x88 l28: ldb [23] <- loads the data at packet offset 23. A = 1 <- IP header protocol field, ICMP in this case l29: jeq #0x6, l30, l35 <- check if it's TCP protocol l30: ldb [46] <- it's TCP l31: rsh #2 <- l32: sub #20 <- l33: add x <- l34: tax <- l35: ldh [x+14] <- load half word from offset 0x88 + 0xE = 150 (0x96). A = 0xEC76 <- not sure what is this l36: st M[4] <- M[4] = 0xEC76 l37: ld M[1] <- A = 178 (0xB2) (total packet size) l38: sub #2 <- A = 176 l39: tax <- X = 176 l40: ldh [x+0] <- A = 0xE343 (load half word from packet offset 0xB0). <- It's the last half word from the packet. l41: or #0x9d6a <- 0xE343 | 0x9D6A. A = 0xFF6B l42: st M[15] <- M[15] = 0xFF6B l43: ldh [x+0] <- A = 0xE343 l44: and #0x9d6a <- A = 0xE343 & 0x9D6A = 0x8142 l45: neg <- A = ~0x8142 + 1 = 0x7EBE l46: sub #1 <- A = 0x7EBD l47: tax <- X = 0x7EBD l48: ld M[15] <- A = 0xFF6B l49: and x <- A = 0xFF6B & 0x7EBD = 0x7E29 l50: tax <- X = 0x7E29 <- ((0xE343 | 0x9D6A) & (~(0xE343 & 0x9D6A) + 1) - 1) = 0x7E29 <- this is just 0xE343 ^ 0x9D6A = 0x7E29 <- this means that the last half word XOR 0x96DA must be <- equal to the trigger l51: ld M[4] <- A = 0xEC76 (the value at packet offset 0x96) l52: jeq x, l55, l53 <- OK if equal l53: ld M[3] <- A = 0x7E29 (the value at beginning of data payload) l54: jeq x, l55, l56 <- OK if equal l55: ret #0xffff <- WE HAVE A WINNER! l56: ret #0 <- Bad luck, maybe next packet? We can describe the format of the data payload: struct __attribute__((packed)) payload { uint16_t trigger; // = checksum ^ 0x9D6A char data[128]; uint16_t size; // = sizeof(struct payload) ^ 0xE6CF uint16_t unknown; uint16_t checksum; }; // sizeof() = 136 bytes Log from the tool that does the port knocking (different packet): TRIGGER DATA COMMAND = 0x01 DESTINATION ADDRESS = 192.168.30.15 TRANSPORT PROTOCOL = icmp (1) TIME STAMP = Mon Apr 17 10:00:00 2017 (1492437600) TIME SKEW = 43200 ICMP TYPE, CODE = 8, 0 CALLBACK ADDRESS = 192.168.30.14:55555 SOURCE PORT = 55302 START OF TRIGGER = 0xdd67 The port knocking tool is extremely flexible and can send all kinds of packets and payloads. It supports TCP, UDP, ICMP, and besides raw packets it can produce DNS, SMTP, SIP application payloads. Can set different flags in TCP packets, for example, send a RST packet with the port knocking payload. Even has a PIX firewall bypass (SYN only packet). Pretty much port knocking on steroids. Next are some of the different packets it can build. SIP application packet: 17:36:08.576977 IP (tos 0x0, ttl 64, id 2, offset 0, flags [none], proto TCP (6), length 342) 192.168.30.14.64778 > 192.168.30.15.acmsoda: Flags [.], cksum 0x69fe (correct), seq 32774:33076, ack 47681, win 32767, length 302 0x0000: 4500 0156 0002 0000 4006 bc32 c0a8 1e0e E..V....@..2.... 0x0010: c0a8 1e0f fd0a 1b39 0000 8006 0000 ba41 .......9.......A 0x0020: 5010 7fff 69fe 0000 5245 4749 5354 4552 P...i...REGISTER 0x0030: 2073 6970 3a61 2053 4950 2f32 2e30 0d0a .sip:a.SIP/2.0.. 0x0040: 546f 3a20 3837 203c 7369 703a 3130 3240 To:.87.<sip:102@ 0x0050: 7878 2e6e 6574 3e0d 0a46 726f 6d3a 2032 xx.net>..From:.2 0x0060: 3435 203c 7369 703a 3130 3140 7878 2e6e 45.<sip:101@xx.n 0x0070: 6574 3e0d 0a43 616c 6c2d 4944 3a20 3233 et>..Call-ID:.23 0x0080: 3334 3340 7071 7879 760d 0a43 5365 713a 343@pqxyv..CSeq: 0x0090: 2032 3720 5245 4749 5354 4552 0d0a 436f .27.REGISTER..Co 0x00a0: 6e74 6163 743a 203c 7369 703a 3130 3140 ntact:.<sip:101@ 0x00b0: 7878 2e6e 6574 3e0d 0a43 6f6e 7465 6e74 xx.net>..Content 0x00c0: 2d4c 656e 6774 683a 2031 3330 0d0a 1050 -Length:.130...P 0x00d0: 2545 647d a2e2 f84d 3281 31bf dcd8 3669 %Ed}...M2.1...6i 0x00e0: 4d73 a214 cb7f ff29 d61a 7e80 f543 8c71 Ms.....)..~..C.q 0x00f0: a7c8 b138 4fe8 59a8 02aa cecd c0c3 a94c ...8O.Y........L 0x0100: 2102 1938 58e4 32d6 3c4e f05a 9d9d be74 !..8X.2.<N.Z...t 0x0110: 0dd1 5617 6eb6 7f50 424b bf94 52e5 bc52 ..V.n..PBK..R..R 0x0120: a769 4bfa 47c6 31d2 989a 93e5 972b 2ff3 .iK.G.1......+/. 0x0130: 0987 6eec 5eed 3013 29dd 2e0f 5dcd 1c53 ..n.^.0.)...]..S 0x0140: 3943 4712 54f7 5688 6791 3313 0c82 b47a 9CG.T.V.g.3....z 0x0150: e647 1076 8d3a .G.v.: The trigger is 0x8d3a ^ 0x9d6a = 0x1050 (offset 0xCE). DNS packet: 17:37:18.675460 IP (tos 0x0, ttl 64, id 2, offset 0, flags [none], proto TCP (6), length 233) 192.168.30.14.56331 > 192.168.30.15.acmsoda: Flags [.], cksum 0x40a1 (correct), seq 10342:10535, ack 48998, win 32767, length 193 0x0000: 4500 00e9 0002 0000 4006 bc9f c0a8 1e0e E.......@....... 0x0010: c0a8 1e0f dc0b 1b39 0000 2866 0000 bf66 .......9..(f...f 0x0020: 5010 7fff 40a1 0000 af99 0180 0001 0001 P...@........... 0x0030: 0000 0000 0231 3502 3330 0331 3638 0331 .....15.30.168.1 0x0040: 3932 0769 6e2d 6164 6472 0461 7270 6100 92.in-addr.arpa. 0x0050: 0010 0001 c00c 0010 0001 0000 ffff 0088 ................ 0x0060: 8718 4938 38d6 654b 6539 8033 da58 cc73 ..I88.eKe9.3.X.s 0x0070: 3a57 25a3 7c08 5e9e 7734 a126 f954 3879 :W%.|.^.w4.&.T8y 0x0080: 64bf 8f7b 741e 2e33 8cbd b07b c750 ae14 d..{t..3...{.P.. 0x0090: 819b d7f3 1742 1c05 2198 570d d509 f1a8 .....B..!.W..... 0x00a0: c323 da36 41b2 ca11 b955 dd59 67e2 d495 .#.6A....U.Yg... 0x00b0: fff3 e7cf 3ce7 33a2 6bc1 f8a3 5d98 9983 ....<.3.k...]... 0x00c0: 583f b3b9 289e c3b9 70bf 45e9 69eb db32 X?..(...p.E.i..2 0x00d0: 1a42 e586 220a fd23 77ad acff 75a2 027d .B.."..#w...u..} 0x00e0: 1a6c 83e6 4718 6f85 23 .l..G.o.# The trigger is 0x8523 ^ 0x9d6a = 0x1849 (offset 0x61). SMTP HELO packet: 17:38:18.413178 IP (tos 0x0, ttl 64, id 2, offset 0, flags [none], proto TCP (6), length 181) 192.168.30.14.19215 > 192.168.30.15.acmsoda: Flags [.], cksum 0x790a (correct), seq 51209:51350, ack 1802, win 32767, length 141 0x0000: 4500 00b5 0002 0000 4006 bcd3 c0a8 1e0e E.......@....... 0x0010: c0a8 1e0f 4b0f 1b39 0000 c809 0000 070a ....K..9........ 0x0020: 5010 7fff 790a 0000 4845 4c4f 204f dc75 P...y...HELO.O.u 0x0030: eee6 2436 82ce 0567 a237 e408 c0c5 ea55 ..$6...g.7.....U 0x0040: 4425 df77 fb9f 1498 c2b5 3cd3 84cd 4178 D%.w......<...Ax 0x0050: e9f7 23e2 0f02 0ce7 7c36 8aa6 f4ea 43cd ..#.....|6....C. 0x0060: d1fe 2406 77b5 29a7 83c4 f497 5c06 75a5 ..$.w.).....\.u. 0x0070: 1528 39a0 d80c 412f f35e 067d d857 ee1c .(9...A/.^.}.W.. 0x0080: 7f3d b09b f209 b8ad fff1 1f1d 52a8 87ca .=..........R... 0x0090: 73dc 5c1a 4a69 4205 f5d9 6836 6f09 be5e s.\.JiB...h6o..^ 0x00a0: 6b87 7ad6 1138 de67 6c3b 33d8 98a4 35e6 k.z..8.gl;3...5. 0x00b0: 474f fad2 b6 GO... The trigger is 0xd2b6 ^ 0x9d6a = 0x4fdc (offset 0x2D). In the port knocking tool we can find references to CORDIALFLIMSY. This appears to be the codename for the packet format (includes trigger and payload). Because the tool is to be used only by the attacker, it has a lot of debug messages and strings aren’t obfuscated. Guess they weren’t counting on losing all those tools. The data contents are encrypted with RC5/6, and contain the callback address and port. In theory it would be possible to recover the callback hosts if we had network dumps of all the packets. The leaks contain traffic bouncer tools so these callback addresses should be just bouncer addresses. Regarding the RC5/6 code, we can find the same constant 0x61C88647 identified by Kaspersky as specific to the Equation group: __int64 __fastcall RC56_keysetup(int *a1, _DWORD *a2) { __int64 v2; // rax int i; // ecx int v4; // ecx int v5; // edi __int64 result; // rax int v7; // er10 unsigned int v8; // er11 int v9; // er14 int v10[7]; // [rsp+0h] [rbp-1Ch] v10[0] = *a1; v10[1] = a1[1]; v10[2] = a1[2]; *a2 = 0xB7E15163; v2 = 1LL; for ( i = 0x5618CB1C; ; i -= 0x61C88647 ) { a2[v2] = i; if ( v2 == 0x31 ) break; ++v2; } v4 = 0; v5 = 0x96; LODWORD(result) = 0; v7 = 0; v8 = 0; do { v7 = __ROL4__(a2[v8] + v4 + v7, 3); a2[v8] = v7; v9 = __ROL4__(v7 + v4 + v10[(unsigned int)result], (v7 + v4) & 0x1F); v10[(unsigned int)result] = v9; v8 = v8 - 50 * ((v8 + 1) / 0x32) + 1; result = (unsigned int)result - 3 * (((int)result + 1) / 3u) + 1; --v5; v4 = v9; } while ( v5 ); return result; } Stephen Checkoway wrote that Kaspersky analysis is wrong (the original URL is gone since 2019 or something). The funny thing is that the macOS binary analysed here was most probably compiled with Clang 4.x (and Xcode 4.x) so his GCC theory might be wrong. Who knows? :-) $ otool -l dewdrop__v__3_3_2_2_x86_64-darwin Load command 10 cmd LC_LOAD_DYLIB cmdsize 56 name /usr/lib/libSystem.B.dylib (offset 24) time stamp 2 Thu Jan 1 01:00:02 1970 current version 159.1.0 compatibility version 1.0.0 The 159.1.0 version of /usr/lib/libSystem.B.dylib can be found in 10.7 SDK and at least Xcode 4.6.3. It’s a Lion 10.7.2 to 10.7.5 system library, and so it should be available in older Xcode versions. Xcode 4 was released between 2011 (4.0) and 2013 (4.6). $ otool -l /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.7.sdk/usr/lib/libSystem.B.dylib Load command 3 cmd LC_ID_DYLIB cmdsize 56 name /usr/lib/libSystem.B.dylib (offset 24) time stamp 1 Thu Jan 1 01:00:01 1970 current version 159.1.0 compatibility version 1.0.0 $ otool -l /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/lib/libSystem.B.dylib Load command 3 cmd LC_ID_DYLIB cmdsize 56 name /usr/lib/libSystem.B.dylib (offset 24) time stamp 1 Thu Jan 1 01:00:01 1970 current version 169.3.0 compatibility version 1.0.0 $ uname -an Darwin lion.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 $ otool -l /usr/lib/libSystem.B.dylib Load command 3 cmd LC_ID_DYLIB cmdsize 56 name /usr/lib/libSystem.B.dylib (offset 24) time stamp 1 Thu Jan 1 01:00:01 1970 current version 159.1.0 compatibility version 1.0.0 There are some other hints that this codebase is quite old. For example, libpcap version is definitely between 0.9.6 and 0.9.8, all released in 2007. You can determine this via the struct pcap definition. Version 0.9.5 was released in 2006, and version 1.0.0 in 2008. Most probably someone around 2007 or 2008 forked one of 0.9.6-0.9.8 versions and modified it to remove all strings and error messages build up. Need to verify the other operating systems versions to see if they match this information and use the same shared libpcap codebase. CONCLUSIONS The NSA toolset is pretty cool! I’m a big fan. You can definitely see that architecture and code are carefully thought and engineered. They have been doing this for a long time and definitely have more resources than most (nation state) attackers. This isn’t some random proof of concept code. Almost every operating system is a target so their catalog is impressive. The problems of SIGINT and wanting to collect all the things. There is no heavily obfuscated code, there are no hardcore anti-debugging measures, and no packers and/or cryptors used. They try to blend in and not be too unique. The leaked Lamberts (CIA) sample has similar ideas behind it. A common cyber philosophy? :-) But they aren’t perfect (nobody is)! They suffer from the code reusage sin. For example, the same string obfuscation algorithm can be seen in other tools. The RC5/6 constant might be another example, if Kaspersky theory is right. Maybe this was true in the past, and the post ShadowBrokers future is better segmented. Hindsight is always 20/20. It is very easy to detect the hacked machines after we have access to these tools and spend some time reverse engineering them. Four years ago I asked my friends at BinaryEdge to scan the Internet since this was their thing. Why reinvent the wheel if you know people who do great wheels? Asked them to send this type of packet and check for the answer. The result is that we managed to find a bunch of hacked machines all over the Internet. That’s the main reason why I never published this before and this post is just focused on the BPF program. I am just a curious reverse engineer ;-). I believe that the tools from the ShadowBrokers leaks are way more interesting than the exploits (altough the leaked exploits sure created a ton more damage worldwide). Reversing the tools allows you to understand a bit of their mindset, their engineering, and operations. And more important, appreciate their work. Here we care about bits and bytes. One last thing, and the real reason I wrote this blogpost. I just opened a new startup, and I am looking for Linux developers. If hunting pandas, bears, kitties, chemical elements, or whatever weird naming out there is your thing then I have an interesting challenge for you. Job description below, if you are in the US or Western Europe send me your CV (reverser at put.as). This is a fully remote position. -------------------------------------------------------------------------------- Senior Linux C developer to develop advanced next generation protection technology for Linux operating systems and to protect the Internet infrastructure. Required skills: * 5+ years of software development and/or programming experience. * Understanding of Linux internals. * Experience with C/C++, GCC and LLVM toolchains. * Debugging experience with GDB, LLDB. * Good TCP/IP knowledge. Nice to have skills: * Experience with Go languague is a plus. * Experience developing for embedded systems. * Code auditing experience. * Reverse engineering experience. * CI/CD integration. Bring: * A can do mindset, you need to be a jack of all trades at this stage. * You want to develop secure, high-performance, and best in class software. * You are comfortable not being the smartest guy in the room. * You are able to say “I don’t know” but you will be able to research and solve the problem. * You learn from mistakes and don’t repeat them. Responsibilities: * Development of our sensor technology. * Code performance optimization. * Debugging and bug fixing. * Auditing external code we use. -------------------------------------------------------------------------------- Have fun, fG! * ← Previous Post * * * * (c) fG! • 2021 • Reverse Engineering Hugo powered • Theme by Beautiful Jekyll adapted to Beautiful Hugo