dtrace: Avoid excessive pcpu allocations
We were previously allocating MAXCPU structures for several purposes,
but this is generally unnecessary and is quite excessive, especially
after MAXCPU was bumped to 1024 on amd64 and arm64. We already are
careful to allocate only as many per-CPU tracing buffers as are needed;
extend this to other allocations.
For example, in a 2-vCPU VM, the size of a consumer state structure
drops from 64KB to 128B. The size of the per-consumer `dts_buffer` and
`dts_aggbuffer` arrays shrink similarly. Ditto for pre-allocations of
local and global D variable storage space.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47667