test: add benchmarks for glob cache performance by Napolitain · Pull Request #2881 · go-task/task

Napolitain · 2026-06-12T06:03:55Z

Add Issue #2853 benchmarks comparing checksum, timestamp, and uncached tasks across many-small and few-large sparse source sets.

In my opinion, it looks like there are too many allocations and there must be inefficiencies in the many-small sources scenario.

  many small / checksum    419 ms   0.24 MB/s      0.095 MiB, 20000 files
  many small / timestamp   144 ms   0.69 MB/s      0.095 MiB, 20000 files
  many small / none        0.71 ms

  few large / checksum    60.4 ms   8883 MB/s      512 MiB, 4 files
  few large / timestamp   0.23 ms   2330118 MB/s   512 MiB, 4 files
  few large / none        0.56 ms

The MB/s is not a IO rate (timestamp doesn't do IO). More like, a throughput comparison.

I have read and followed the Contribution Guide

Add Issue go-task#2853 benchmarks comparing checksum, timestamp, and uncached tasks across many-small and few-large sparse YAML source sets. Baseline on Intel i7-14700K, go test -run '^$' -bench 'BenchmarkIssue2853.*SparseYAMLFiles' -benchtime=3x -count=3 ./ Many small sparse YAML files (20,000 x 5 bytes): checksum 440-451 ms/op, timestamp 140-148 ms/op, none 1.1-1.3 ms/op. Few large sparse YAML files (4 x 128 MiB): checksum 60-61 ms/op, timestamp 213-239 us/op, none 1.1-1.3 ms/op. Sparse files avoid bulk data writes while preserving logical file size for checksum/timestamp comparisons.

Napolitain · 2026-06-12T06:09:43Z

I suggest this benchmark for tracking speed for many small, and few large globs.

trulede · 2026-06-12T07:57:09Z

Would/could you add an OS Native benchmark too, using mtime. As a reference point.

Having the test profile code might also be useful. This function in particular:
https://github.com/go-task/task/blob/main/internal/fingerprint/sources_timestamp.go

Edit: Another point of reference (in addition to mtime) would be to generate a Makefile and run that over the files too.

trulede · 2026-06-12T08:17:49Z

@@ -0,0 +1,155 @@
+package task_test


Need a build tag here.

//go:build fsbench // +build fsbench

addressed in 295fea2

trulede · 2026-06-12T09:07:32Z

@Napolitain If you want to try your luck and improve the performance, I "Asked AI" to make the code more efficient, and then again to see if the duplicate calls to os.Stat() could be improved. There is not much code there, so profiling or trial and error should find some improvement.

https://github.com/go-task/task/blob/main/internal/fingerprint/sources_timestamp.go

Strategy: globbing improved

package fingerprint

import (
	"os"
	"path/filepath"
	"time"

	"github.com/go-task/task/v3/taskfile/ast"
)

// TimestampChecker checks if any source change compared with the generated files,
// using file modifications timestamps.
type TimestampChecker struct {
	tempDir string
	dry     bool
}

func NewTimestampChecker(tempDir string, dry bool) *TimestampChecker {
	return &TimestampChecker{
		tempDir: tempDir,
		dry:     dry,
	}
}

// IsUpToDate implements the Checker interface
func (checker *TimestampChecker) IsUpToDate(t *ast.Task) (bool, error) {
	if len(t.Sources) == 0 {
		return false, nil
	}

	sources, err := Globs(t.Dir, t.Sources)
	if err != nil {
		return false, nil
	}

	// 1. Evaluate general glob lists immediately to avoid duplicate disk scans
	generates, err := Globs(t.Dir, t.Generates)
	if err != nil {
		return false, nil
	}

	// 2. Optimized Early Exit: If patterns exist but found no files, task must run
	if len(t.Generates) > 0 {
		hasPositivePattern := false
		for _, g := range t.Generates {
			if !g.Negate {
				hasPositivePattern = true
				break
			}
		}
		if hasPositivePattern && len(generates) == 0 {
			return false, nil
		}
	}

	timestampFile := checker.timestampFilePath(t)

	// 3. Check timestamp file existence
	_, err = os.Stat(timestampFile)
	if err == nil {
		generates = append(generates, timestampFile)
	} else {
		// Create the timestamp file for the next execution when it does not exist.
		if !checker.dry {
			if err := os.MkdirAll(filepath.Dir(timestampFile), 0o755); err != nil {
				return false, err
			}
			f, err := os.Create(timestampFile)
			if err != nil {
				return false, err
			}
			f.Close()
		}
	}

	taskTime := time.Now()

	// 4. FIX: Get the MINIMUM (oldest) time of the generates, not the max.
	// If any source is newer than our OLDEST output, the build is stale.
	generateMinTime, err := getMinTime(generates...)
	if err != nil || generateMinTime.IsZero() {
		return false, nil
	}

	// 5. Check if any source files are newer than our oldest generated file (Lazy execution)
	shouldUpdate, err := anyFileNewerThan(sources, generateMinTime)
	if err != nil {
		return false, nil
	}

	// Modify the metadata of the file to the current time.
	if !checker.dry {
		if err := os.Chtimes(timestampFile, taskTime, taskTime); err != nil {
			return false, err
		}
	}

	return !shouldUpdate, nil
}

func (checker *TimestampChecker) Kind() string {
	return "timestamp"
}

// Value implements the Checker Interface
func (checker *TimestampChecker) Value(t *ast.Task) (any, error) {
	sources, err := Globs(t.Dir, t.Sources)
	if err != nil {
		return time.Now(), err
	}

	sourcesMaxTime, err := getMaxTime(sources...)
	if err != nil {
		return time.Now(), err
	}

	if sourcesMaxTime.IsZero() {
		return time.Unix(0, 0), nil
	}

	return sourcesMaxTime, nil
}

// Added to track the oldest artifact constraint
func getMinTime(files ...string) (time.Time, error) {
	var minT time.Time
	for i, f := range files {
		info, err := os.Stat(f)
		if err != nil {
			return time.Time{}, err
		}
		modTime := info.ModTime()
		if i == 0 || modTime.Before(minT) {
			minT = modTime
		}
	}
	return minT, nil
}

func getMaxTime(files ...string) (time.Time, error) {
	var maxT time.Time
	for i, f := range files {
		info, err := os.Stat(f)
		if err != nil {
			return time.Time{}, err
		}
		modTime := info.ModTime()
		if i == 0 || modTime.After(maxT) {
			maxT = modTime
		}
	}
	return maxT, nil
}

// If the modification time of any of the files is newer than the given time, returns true.
// This function is lazy, as it stops when it finds a file newer than the given time.
func anyFileNewerThan(files []string, givenTime time.Time) (bool, error) {
	for _, f := range files {
		info, err := os.Stat(f)
		if err != nil {
			return false, err
		}
		if info.ModTime().After(givenTime) {
			return true, nil
		}
	}
	return false, nil
}

// OnError implements the Checker interface
func (*TimestampChecker) OnError(t *ast.Task) error {
	return nil
}

func (checker *TimestampChecker) timestampFilePath(t *ast.Task) string {
	return filepath.Join(checker.tempDir, "timestamp", normalizeFilename(t.Task))
}

Strategy: os.Stat calls improved

package fingerprint

import (
	"os"
	"path/filepath"
	"time"

	"github.com/go-task/task/v3/taskfile/ast"
)

// TimestampChecker checks if any source change compared with the generated files,
// using file modifications timestamps.
type TimestampChecker struct {
	tempDir string
	dry     bool
}

func NewTimestampChecker(tempDir string, dry bool) *TimestampChecker {
	return &TimestampChecker{
		tempDir: tempDir,
		dry:     dry,
	}
}

// IsUpToDate implements the Checker interface
func (checker *TimestampChecker) IsUpToDate(t *ast.Task) (bool, error) {
	if len(t.Sources) == 0 {
		return false, nil
	}

	sources, err := Globs(t.Dir, t.Sources)
	if err != nil {
		return false, nil
	}

	generates, err := Globs(t.Dir, t.Generates)
	if err != nil {
		return false, nil
	}

	if len(t.Generates) > 0 {
		hasPositivePattern := false
		for _, g := range t.Generates {
			if !g.Negate {
				hasPositivePattern = true
				break
			}
		}
		if hasPositivePattern && len(generates) == 0 {
			return false, nil
		}
	}

	timestampFile := checker.timestampFilePath(t)

	_, err = os.Stat(timestampFile)
	if err == nil {
		generates = append(generates, timestampFile)
	} else if !checker.dry {
		if err := os.MkdirAll(filepath.Dir(timestampFile), 0o755); err != nil {
			return false, err
		}
		f, err := os.Create(timestampFile)
		if err != nil {
			return false, err
		}
		f.Close()
	}

	taskTime := time.Now()

	// 1. Establish the absolute baseline boundary (the oldest generated asset)
	var minGenerateTime time.Time
	for i, g := range generates {
		info, err := os.Stat(g)
		if err != nil {
			return false, nil // Missing output asset forces a re-run
		}
		modTime := info.ModTime()
		if i == 0 || modTime.Before(minGenerateTime) {
			minGenerateTime = modTime
		}
	}

	// 2. Interleaved lazy verification check on sources
	// We run os.Stat sequentially and exit the instant a file is found to be stale.
	for _, s := range sources {
		info, err := os.Stat(s)
		if err != nil {
			return false, nil // Missing source file means target cannot be evaluated cleanly
		}
		// If ANY source file is newer than our oldest output asset, it's stale.
		if info.ModTime().After(minGenerateTime) {
			return false, nil
		}
	}

	if !checker.dry {
		if err := os.Chtimes(timestampFile, taskTime, taskTime); err != nil {
			return false, err
		}
	}

	return true, nil
}

func (checker *TimestampChecker) Kind() string {
	return "timestamp"
}

// Value implements the Checker Interface
func (checker *TimestampChecker) Value(t *ast.Task) (any, error) {
	sources, err := Globs(t.Dir, t.Sources)
	if err != nil {
		return time.Now(), err
	}

	var maxT time.Time
	for i, f := range sources {
		info, err := os.Stat(f)
		if err != nil {
			return time.Now(), err
		}
		if i == 0 || info.ModTime().After(maxT) {
			maxT = info.ModTime()
		}
	}

	if maxT.IsZero() {
		return time.Unix(0, 0), nil
	}
	return maxT, nil
}

func (*TimestampChecker) OnError(t *ast.Task) error {
	return nil
}

func (checker *TimestampChecker) timestampFilePath(t *ast.Task) string {
	return filepath.Join(checker.tempDir, "timestamp", normalizeFilename(t.Task))
}

Add an OS-native mtime reference point for the Issue go-task#2853 filesystem benchmarks. The reference walks the same sparse YAML source tree with filepath.WalkDir, stats YAML files through DirEntry.Info, and compares mtimes against a generated output file. The benchmark is available under the fsbench build tag alongside the Task checksum, timestamp, and uncached cases.

Napolitain · 2026-06-13T03:07:50Z

Would/could you add an OS Native benchmark too, using mtime. As a reference point.

Having the test profile code might also be useful. This function in particular: https://github.com/go-task/task/blob/main/internal/fingerprint/sources_timestamp.go

Edit: Another point of reference (in addition to mtime) would be to generate a Makefile and run that over the files too.

addressed in ec19102 if I understood that part correctly.

trulede reviewed Jun 12, 2026

View reviewed changes

andreynering linked an issue Jun 12, 2026 that may be closed by this pull request

Cache is very slow #2853

Open

Napolitain added 2 commits June 12, 2026 20:01

test: gate filesystem benchmarks behind fsbench tag

295fea2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: add benchmarks for glob cache performance#2881

test: add benchmarks for glob cache performance#2881
Napolitain wants to merge 3 commits into
go-task:mainfrom
Napolitain:issue-2853-glob-benchmarks

Napolitain commented Jun 12, 2026 •

edited

Loading

Uh oh!

Napolitain commented Jun 12, 2026

Uh oh!

trulede commented Jun 12, 2026 •

edited

Loading

Uh oh!

trulede Jun 12, 2026

Uh oh!

Napolitain Jun 13, 2026

Uh oh!

trulede commented Jun 12, 2026

Uh oh!

Napolitain commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Napolitain commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Napolitain commented Jun 12, 2026

Uh oh!

trulede commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trulede Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Napolitain Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

trulede commented Jun 12, 2026

Uh oh!

Napolitain commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Napolitain commented Jun 12, 2026 •

edited

Loading

trulede commented Jun 12, 2026 •

edited

Loading