Pass free variables as parameters when possible for tight loops

It turns out that passing free variables that are not of function type as parameters, as opposed to fetching them from the closure may be beneficial for tight loops. One example appears in the benchmark [mlkit-bench/benchmarks/mandelbrot.sml](https://github.com/melsman/mlkit-bench/blob/fd305041385525d4644e785c8acda8af217345bb/benchmarks/mandelbrot.sml#L25). Here the two variables `c_re` and `c_im` are free in the function `loop3`. Two optimisations will almost double the performance of the code:

1. passing the variables `c_re` and `c_im` as parameters (they will then be unboxed and passed in floating point registers.)
2. passing the constant `4.0` as a parameter to the function.

It is straightforward to verify the effect of the two optimisations; here is how the function `loop3` can be hand-optimised by the programmer:

```sml
fun loop3 (count, z_re : real, z_im : real, c_re, c_im, four) = 
  if (count < maxCount)
  then let val z_re_sq = z_re * z_re
           val z_im_sq = z_im * z_im
       in if (z_re_sq + z_im_sq) > four
          then count
          else let val z_re_im = z_re * z_im
               in loop3 (count+0w1,
                         z_re_sq - z_im_sq + c_re,
                         z_re_im + z_re_im + c_im,
                         c_re,
                         c_im,
                         four)
               end
       end
  else count
```  

Notice that this optimisation is not in conflict with the function specialisation optimisation that specializes recursive functions that, invariantly, take other functions as parameters.

Here is the resulting assembler code - labels are slightly shortened:

```asm
FLab.loop3:
	leaq -8(%rsp),%rsp
	movq $0x400,%r11
	cmpq %r11,%rax
	jae .LLab.k33
	movsd %xmm0,%xmm6
	mulsd %xmm0,%xmm6
	movsd %xmm1,%xmm10
	mulsd %xmm1,%xmm10
	movsd %xmm6,%xmm8
	addsd %xmm10,%xmm8
	ucomisd %xmm4,%xmm8
	ja .LLab.k33
	mulsd %xmm1,%xmm0
	addq $0x1,%rax
	subsd %xmm10,%xmm6
	addsd %xmm2,%xmm6
	addsd %xmm0,%xmm0
	addsd %xmm3,%xmm0
	movsd %xmm0,%xmm1
	movsd %xmm6,%xmm0
	leaq 8(%rsp),%rsp
	jmp FLab.loop3
	.p2align 0x4
.LLab.k33:
	movq %rax,%rdi
	leaq 8(%rsp),%rsp
	popq %r11
	jmp *%r11
```

Compared to the original version of mandelbrot, we have also changed the parameter `count` to be of type `word`, which turns out not to have a huge effect (only one `jo __overflow` instruction is saved, after the `addq` instruction).



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pass free variables as parameters when possible for tight loops #111

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Pass free variables as parameters when possible for tight loops #111

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions