Goroutines and Recursion

Given the following code, how many goroutines would you expect to be printed by the Printf statement?

var counter int64

func Run(done chan struct{}) {
	go func() {
		if counter >= 1000 {
			done <- struct{}{}
			time.Sleep(500000)
		} else {
			atomic.AddInt64(&counter, 1)
			Run(done)
		}
	}()
}

func main() {
	done := make(chan struct{})
	Run(done)

	<-done
	fmt.Printf("Counter=%v Goroutines=%v\n", counter, runtime.NumGoroutine())
}

Options:

A) 1001
B) 1000
C) 2
D) 1
E) System Crash

Answer

The answer is C, 2 goroutines. Bonus points if you noticed that the way the counter is being read is not concurrency safe.

At first glance, you may think that goroutines that spawn other goroutines would have their lifetimes linked in some way, such as the each parent being linked to the lifetimes of their children. This is an easy mistake to make given the apparent recursive nature of the code above. However, each goroutine is actually entirely independent of one another, and each one will terminate when they have no further instructions to execute.

In the example above, each goroutine will spawn a child goroutine and almost immediately terminate. The final goroutine will send a response to the main goroutine via a channel and sleep, preventing it from terminating.

Now what would happen if we modified the code to look like this?

var counter int64

func Run() {
	go func() {
		if counter >= 1000 {
			time.Sleep(500000)
		} else {
			atomic.AddInt64(&counter, 1)
			Run()
		}
	}()
}

func main() {
	Run()
	fmt.Printf("Counter=%v Goroutines=%v\n", counter, runtime.NumGoroutine())
}

Answer: the program would terminate and print a warning. The main goroutine will terminate as soon as there are no instructions for it to execute, which happens after the Printf statement. Without the channel for synchronization, the main goroutine has no reason to wait for the completion of our ‘recursive’ goroutine spawning.

Practical Uses

Since parent and child goroutines don’t have their lifecycles linked you can leverage this behavior in powerful ways. For example, we can have an asynchronous job worker restart itself when a panic occurs!

type Worker struct {
	job    func() error
	result chan error
}

func (w *Worker) Start() {
	go w.Run()
}

func (w *Worker) Run() {
	fmt.Printf("Running...\n")
	defer func() {
		if err := recover(); err != nil {
			fmt.Printf("Panic=%v\n", err)
			go w.Run()
		}
	}()

	w.result <- w.job()
	fmt.Printf("Done!\n")
}

func main() {
	var hasFailed bool
	worker := &Worker{
		job: func() error {
			if !hasFailed {
				hasFailed = true
				panic(fmt.Errorf("Oh dear!"))
			}

			fmt.Printf("hard job complete!\n")
			return nil
		},
		result: make(chan error),
	}

	worker.Start()
	result := <-worker.result
	fmt.Printf("Result=%v\n", result)
}

In the above code, we create a worker that will panic on the first attempt to run the job. The initial Run goroutine captures the panic and simply spawns a new goroutine to replace the original failed goroutine. The output of the above code will look something like this (the Done! statement may not be printed):

Running...
Panic=Oh dear!
Running...
hard job complete!
Done!
Result=<nil>

If you have a worker that processes a variety of jobs, then having the worker respawn itself can be a simpler solution than having an orchestrator track worker terminations via channels and respawn the failed worker.

TL;DR

All goroutines run and terminate independently of one another. You can leverage this independent behavior to do interesting and bizarre looking things, such as ‘recursively’ spawning goroutines without breaking the call stack. If you care about orchestrating the lifecycles of goroutines, then you’ll need to use an orchestration primitive like a channel, mutex or waitgroup.

Credit goes to Vadim Uchitel for the goroutine worker restart idea that prompted this article.