That's a good point. I tried it on a real i7 core as well. The profile looks similar. `memcpy()` is the bottleneck and the insertion rate is ~18,000 rows/sec.
That's a good point. I tried it on a real i7 core as well. The profile looks similar. `memcpy()` is the bottleneck and the insertion rate is ~18,000 rows/sec.