singleflight-CSDN博客

singleflight 包是 Go 官方扩展库 golang.org/x/sync 中的一个工具，主要用于抑制重复的函数调用。其经典场景是防止缓存击穿：当多个并发请求同时需要同一个资源（例如同一个 key 的数据库查询）时，只让第一个请求真正执行，其余请求则等待并共享第一个请求的结果。

下面结合你给出的源码，从结构、核心流程、panic/goexit 处理等几个方面进行详细讲解。

1. 核心数据结构

`call` (在途或已完成的调用)

type call struct {
    wg    sync.WaitGroup
    val   any
    err   error
    dups  int
    chans []chan<- Result
}

wg：用于等待原始调用完成。重复调用者会 Wait 在该 WaitGroup 上，原始调用完成时 Done。
val / err：存放 fn 返回的结果和错误。
dups：记录有多少个重复调用共享了这次结果（包括自己不算，只有后来者才增加 dups）。
chans：仅用于 DoChan 方法，存放所有等待结果的 channel。

`Group` (调用组)

type Group struct {
    mu sync.Mutex
    m  map[string]*call
}

通过 key 来区分不同任务。相同 key 的调用会合并。
使用互斥锁保护 map，map 惰性初始化。

`Result`

type Result struct {
    Val    any
    Err    error
    Shared bool
}

Shared 表示该结果是否被多个调用者共享（即有重复调用发生）。

2. 具体方法详解

`Do` 方法

func (g *Group) Do(key string, fn func() (any, error)) (v any, err error, shared bool)

流程：

加锁，检查 map 是否初始化。
如果该 key 已有 call（说明有请求正在执行）：
- c.dups++ 记录重复次数。
- 解锁，然后 c.wg.Wait() 等待原始调用完成。
- 等待结束后检查错误类型：
  - 若是 *panicError（说明原始函数发生了 panic），直接 panic(e) 将 panic 传播给当前等待者，保持与原始调用一致的行为。
  - 若是 errGoexit（说明原始函数调用了 runtime.Goexit），则调用 runtime.Goexit() 退出当前 goroutine。
  - 否则正常返回 c.val, c.err, true（shared = true）。
如果没有 call：
- 新建 call，wg.Add(1)，放入 map，解锁。
- 调用 g.doCall(c, key, fn) 实际执行用户函数。
- 返回结果，并根据 c.dups > 0 决定 shared 值。

`DoChan` 方法

func (g *Group) DoChan(key string, fn func() (any, error)) <-chan Result

与 Do 类似，但不阻塞等待结果，而是返回一个容量为 1 的 channel。
流程：

创建 ch := make(chan Result, 1)。
加锁后，若发现已有 call，则将 ch 追加到 c.chans 切片中，dups++，然后直接返回 ch。
若无 call，新建 call，把 ch 作为 chans 的初始成员，启动一个新 goroutine 执行 g.doCall，然后返回 ch。

调用者可从 channel 中收结果，但当发生 panic 或 Goexit 时，channel 不会收到任何值，也不会被关闭。这是故意的设计，文档明确说明“返回的 channel 不会被关闭”。

3. `doCall` —— 调用执行核心

doCall 负责实际调用 fn，并处理返回、panic 和 runtime.Goexit 三种情况。

难点在于区分普通 panic 和 runtime.Goexit：
runtime.Goexit 会终止当前 goroutine，但会先执行所有 defer。在 defer 中调用 recover 会返回 nil，因此无法直接用 recover 的返回值来区分。标准库的做法是利用“双重 defer”配合标志位。

双重 defer 结构

func (g *Group) doCall(c *call, key string, fn func() (any, error)) {
    normalReturn := false
    recovered := false

    // 外层 defer：最终清理和结果分发
    defer func() {
        // 如果既不是正常返回，也没有 recover 到 panic，那就是 Goexit
        if !normalReturn && !recovered {
            c.err = errGoexit
        }

        g.mu.Lock()
        c.wg.Done()
        if g.m[key] == c {
            delete(g.m, key)    // 清理 map
        }
        g.mu.Unlock()

        // 根据 err 类型决定行为
        if e, ok := c.err.(*panicError); ok {
            // 原始函数发生了 panic
            if len(c.chans) > 0 {
                go panic(e)      // 在新 goroutine 中 panic，确保无法被恢复
                select {}        // 阻塞当前 goroutine，使其出现在 crash dump 中
            } else {
                panic(e)         // 直接 panic
            }
        } else if c.err == errGoexit {
            // 已是 Goexit 流程，无需再调用
        } else {
            // 正常情况，向所有 DoChan 的等待者发送结果
            for _, ch := range c.chans {
                ch <- Result{c.val, c.err, c.dups > 0}
            }
        }
    }()

    // 内层匿名函数：调用 fn 并 recover
    func() {
        defer func() {
            if !normalReturn {
                // 发生了 panic，recover 捕获
                if r := recover(); r != nil {
                    c.err = newPanicError(r)
                }
            }
        }()
        c.val, c.err = fn()
        normalReturn = true
    }()

    // 如果内层函数没有正常返回（normalReturn 仍为 false），
    // 且没有 recover（即 recover 返回 nil，意味着 runtime.Goexit），
    // 则 recovered 保持 false，外层 defer 能据此判定为 Goexit。
    if !normalReturn {
        recovered = true
    }
}

执行逻辑解析：

内层匿名函数 func() 执行 fn()：
- 若 fn 正常返回 → normalReturn = true。
- 若 fn 中发生 panic → 内层 defer 的 recover() 捕获到值，c.err = newPanicError(r)，normalReturn 仍为 false。此时 recover 返回非 nil，所以外层的 !recovered 仍成立？等一下：内层 defer 是在 if !normalReturn 里调用的 recover()，一旦 recover 到值，c.err 被设置。然后内层函数结束，此时控制流回到外层 if !normalReturn { recovered = true }。所以若发生 panic，recovered 被置为 true。
- 若 fn 调用了 runtime.Goexit → 该 goroutine 停止执行 fn 后面的代码，normalReturn 保持 false，然后执行所有 defer。内层 defer 中的 recover() 返回 nil（因 Goexit 不算 panic），因此 c.err 保持 nil。接着 recovered 也保持 false（因为 if !normalReturn { recovered = true } 这句还没执行 goroutine 就退出了？不对，runtime.Goexit 会执行所有 defer 然后退出，所以在外层 defer 执行前，控制流不会到达 if !normalReturn { recovered = true }。因此外层 defer 中看到 !normalReturn && !recovered 成立，将 c.err 设为 errGoexit。
外层 defer 保证无论何时都会跑，它在最后根据 c.err 的类型进行处理：
- panicError：重新把 panic 抛出去，但为了避免被调用者的 recover 捕获后无法继续，代码做了分支处理。有 chans 时（即有人用 DoChan 等待），启动新 goroutine 去 panic，自身则永远阻塞在 select {}，这样当前 goroutine 会留在所有 goroutine 转储中，便于排查。
- errGoexit：啥也不做，外层 defer 结束后，goroutine 将正式退出（Goexit 流程继续）。
- 正常：向所有 DoChan channel 发送 Result。

4. 辅助类型和错误处理

`panicError`

type panicError struct {
    value any
    stack []byte
}

将 panic 的值和发生时的调用栈包装在一起。
实现了 Error 接口，输出格式为 "panic value\n\n堆栈"。
Unwrap 方法允许通过 errors.Is/As 提取原始错误（如果 value 是 error 接口）。

`newPanicError`

调用 debug.Stack() 获取堆栈。
裁剪掉第一行（形如 goroutine 1 [running]:），因为这行信息在后续传播时可能已过时。
返回 *panicError。

`errGoexit`

一个哨兵错误，表示用户函数中调用了 runtime.Goexit。

5. `Forget` 方法

func (g *Group) Forget(key string)

简单地从 map 中删除指定 key。这样后续相同 key 的 Do 调用就会发起新的实际请求，而不是等待可能还在执行的旧请求。常用在异步场景下，想让某个 key 在未来重新触发加载。

6. 使用上的注意事项

结果共享的正确性：shared 只表示是否有多个 goroutine 获取到了结果，不代表结果本身可以不加锁并发读写。返回值 val 若为指针或引用类型，调用方需要自行同步。
DoChan 的 panic 泄漏：如代码注释所述，当 fn 发生 panic 且使用 DoChan 时，等待结果的 channel 将永远收不到数据且不关闭，这会导致 goroutine 泄漏。因此使用 DoChan 时，最好配合 select 和超时，或者确保 fn 不会 panic。
Goexit 的传播：通过将 errGoexit 传递给等待者，再由等待者调用 runtime.Goexit() 来维持原始语义，这是一个很精妙的设计。
双重 defer 技巧：这段代码是 Go 中区分 panic 和 runtime.Goexit 的经典范例，值得细读。

官方的 singleflight 库虽然核心的去重机制很优秀，但也存在一些设计上的局限和边界情况问题；而社区涌现的许多替代库正是为了解决这些痛点而生的，各自有不同的侧重点。

官方库(singleflight)的设计局限与问题

缺乏原生超时/取消机制：官方库的方法不支持传入 context.Context，这是一个关键限制。如果飞行中的请求阻塞（如DB卡死），所有等待者将无限期挂起，这比缓存击穿本身风险更高。通常需要使用方自己封装一层来解决。
interface{} 导致类型不安全：官方库使用 interface{} 传递和返回结果，这意味着放弃了编译时类型检查，必须在运行时进行类型断言，容易引入 panic。同时，这也会带来额外的内存分配和 GC 压力。
无法处理常态化的批量请求：如果你有常态化的批量查询需求（例如，一批 ID 获取用户信息），等待每个 ID 的独立请求合并效率很低。而官方库不支持在回调中传递多个 Key，一次性批量获取它们的结果。
与 Forget 方法相关的并发 Bug：当异步的 Forget 和 doCall 的清理逻辑并发执行时，Forget 删除 key 后，doCall 执行完毕会再次删除，这可能会误删其他协程刚刚设置的新 key。官方修复方案是引入一个 forgotten 布尔值标记，此修复已合入 master 分支。
关于 DoChan 方法中 go panic(e) 的冗余性：源码中处理 panic 的方式曾引发社区对代码冗余的讨论。维护者的意图是保留当前 goroutine 的现场以便调试，但其副作用是让 panic 永远无法被上层恢复，代价是程序直接崩溃。

社区替代库的核心特点

这些库大多都支持泛型，并且绝大多数也支持 context.Context。在此基础上，它们各自的突出特点和对比如下：

替代库	最突出的差异化特点	性能/设计亮点
github.com/samber/go-singleflightx	支持批量请求 (Batching)	支持在回调中传入多个 Key 批量获取结果；提供分片组 (Sharded Groups)，减少锁竞争；支持 Nullable Result。
github.com/oy3o/singleflight	极致零内存分配 (Zero Allocation)	通过 `sync.Pool` 等手段，在缓存未命中场景下也能实现 0 内存分配，对 GC 极度友好。基准测试显示延迟约为 140ns，优于官方的 190ns。
github.com/brunomvsouza/singleflight	追求100% API兼容	风格保守，力求与原版100%兼容。版本号跟随上游，可作为泛型版的直接替代品。
github.com/marwan-at-work/singleflight	原生 Context 支持	设计上引入了 `context.Context`，当 Context 被取消（如HTTP请求断开）时，等待者可以提前退出，避免无效等待。

通用的避坑指南与最佳实践

配置超时与熔断：必须为"飞行中"的函数调用设置超时，或引入熔断机制，防止底层服务故障导致所有请求堆积。
精确设计 Key：确保 Key 能唯一代表需要去重的业务操作。避免在 Key 中包含无关的动态参数（如时间戳），否则去重会失效。
保持 Group 为单例：singleflight.Group 实例必须在整个服务生命周期内全局唯一（通常定义为包级变量），才能保证去重效果。
警惕高频 Key 的性能开销：在高并发下，如果 Key 数量巨大且大部分都不可重复，sync.Mutex 的锁竞争反而会带来微小的性能损耗（约5-10μs），虽然不大，但在极端性能敏感场景值得注意。
正确使用指针类型：当 fn 返回的是指针（如 *User）时，多个 goroutine 会共享该指针。务必确保后续对该指针的读取操作是并发安全的，避免数据竞争。

下面是golang官方库代码

package singleflight // import "golang.org/x/sync/singleflight"

import (
	"bytes"
	"errors"
	"fmt"
	"runtime"
	"runtime/debug"
	"sync"
)

// errGoexit indicates the runtime.Goexit was called in
// the user given function.
var errGoexit = errors.New("runtime.Goexit was called")

// A panicError is an arbitrary value recovered from a panic
// with the stack trace during the execution of given function.
type panicError struct {
	value any
	stack []byte
}

// Error implements error interface.
func (p *panicError) Error() string {
	return fmt.Sprintf("%v\n\n%s", p.value, p.stack)
}

func (p *panicError) Unwrap() error {
	err, ok := p.value.(error)
	if !ok {
		return nil
	}

	return err
}

func newPanicError(v any) error {
	stack := debug.Stack()

	// The first line of the stack trace is of the form "goroutine N [status]:"
	// but by the time the panic reaches Do the goroutine may no longer exist
	// and its status will have changed. Trim out the misleading line.
	if line := bytes.IndexByte(stack[:], '\n'); line >= 0 {
		stack = stack[line+1:]
	}
	return &panicError{value: v, stack: stack}
}

// call is an in-flight or completed singleflight.Do call
type call struct {
	wg sync.WaitGroup

	// These fields are written once before the WaitGroup is done
	// and are only read after the WaitGroup is done.
	val any
	err error

	// These fields are read and written with the singleflight
	// mutex held before the WaitGroup is done, and are read but
	// not written after the WaitGroup is done.
	dups  int
	chans []chan<- Result
}

// Group represents a class of work and forms a namespace in
// which units of work can be executed with duplicate suppression.
type Group struct {
	mu sync.Mutex       // protects m
	m  map[string]*call // lazily initialized
}

// Result holds the results of Do, so they can be passed
// on a channel.
type Result struct {
	Val    any
	Err    error
	Shared bool
}

// Do executes and returns the results of the given function, making
// sure that only one execution is in-flight for a given key at a
// time. If a duplicate comes in, the duplicate caller waits for the
// original to complete and receives the same results.
// The return value shared indicates whether v was given to multiple callers.
func (g *Group) Do(key string, fn func() (any, error)) (v any, err error, shared bool) {
	g.mu.Lock()
	if g.m == nil {
		g.m = make(map[string]*call)
	}
	if c, ok := g.m[key]; ok {
		c.dups++
		g.mu.Unlock()
		c.wg.Wait()

		if e, ok := c.err.(*panicError); ok {
			panic(e)
		} else if c.err == errGoexit {
			runtime.Goexit()
		}
		return c.val, c.err, true
	}
	c := new(call)
	c.wg.Add(1)
	g.m[key] = c
	g.mu.Unlock()

	g.doCall(c, key, fn)
	return c.val, c.err, c.dups > 0
}

// DoChan is like Do but returns a channel that will receive the
// results when they are ready.
//
// The returned channel will not be closed.
func (g *Group) DoChan(key string, fn func() (any, error)) <-chan Result {
	ch := make(chan Result, 1)
	g.mu.Lock()
	if g.m == nil {
		g.m = make(map[string]*call)
	}
	if c, ok := g.m[key]; ok {
		c.dups++
		c.chans = append(c.chans, ch)
		g.mu.Unlock()
		return ch
	}
	c := &call{chans: []chan<- Result{ch}}
	c.wg.Add(1)
	g.m[key] = c
	g.mu.Unlock()

	go g.doCall(c, key, fn)

	return ch
}

// doCall handles the single call for a key.
func (g *Group) doCall(c *call, key string, fn func() (any, error)) {
	normalReturn := false
	recovered := false

	// use double-defer to distinguish panic from runtime.Goexit,
	// more details see https://golang.org/cl/134395
	defer func() {
		// the given function invoked runtime.Goexit
		if !normalReturn && !recovered {
			c.err = errGoexit
		}

		g.mu.Lock()
		defer g.mu.Unlock()
		c.wg.Done()
		if g.m[key] == c {
			delete(g.m, key)
		}

		if e, ok := c.err.(*panicError); ok {
			// In order to prevent the waiting channels from being blocked forever,
			// needs to ensure that this panic cannot be recovered.
			if len(c.chans) > 0 {
				go panic(e)
				select {} // Keep this goroutine around so that it will appear in the crash dump.
			} else {
				panic(e)
			}
		} else if c.err == errGoexit {
			// Already in the process of goexit, no need to call again
		} else {
			// Normal return
			for _, ch := range c.chans {
				ch <- Result{c.val, c.err, c.dups > 0}
			}
		}
	}()

	func() {
		defer func() {
			if !normalReturn {
				// Ideally, we would wait to take a stack trace until we've determined
				// whether this is a panic or a runtime.Goexit.
				//
				// Unfortunately, the only way we can distinguish the two is to see
				// whether the recover stopped the goroutine from terminating, and by
				// the time we know that, the part of the stack trace relevant to the
				// panic has been discarded.
				if r := recover(); r != nil {
					c.err = newPanicError(r)
				}
			}
		}()

		c.val, c.err = fn()
		normalReturn = true
	}()

	if !normalReturn {
		recovered = true
	}
}

// Forget tells the singleflight to forget about a key.  Future calls
// to Do for this key will call the function rather than waiting for
// an earlier call to complete.
func (g *Group) Forget(key string) {
	g.mu.Lock()
	delete(g.m, key)
	g.mu.Unlock()
}