improve throughput in copyloop() using bigger buffer
- refactored clientthread to put the handshake code into its own
function, since it used its own 1K stack buffer.
by returning from the function before calling copyloop, we have
that space available in the new stackframe.
- since getaddrinfo() was the main stack consumer in the entire
code, we can safely use at least half the available thread
stack size for the copyloop to achieve higher throughput.
in my testing with pyhttpd it turned out that 64k is the sweet
spot to have minimal syscall overhead, but 16k is very close,
and it allows us to keep the minimal memory usage profile.
this is in response to https://github.com/rofl0r/microsocks/issues/58#issuecomment-
2118389063
which links to a repo that tests different socks5 servers
performance on gigabit links.
also closes #10