Skip to content

test: add benchmarks for glob cache performance#2881

Open
Napolitain wants to merge 3 commits into
go-task:mainfrom
Napolitain:issue-2853-glob-benchmarks
Open

test: add benchmarks for glob cache performance#2881
Napolitain wants to merge 3 commits into
go-task:mainfrom
Napolitain:issue-2853-glob-benchmarks

Conversation

@Napolitain

@Napolitain Napolitain commented Jun 12, 2026

Copy link
Copy Markdown

Add Issue #2853 benchmarks comparing checksum, timestamp, and uncached tasks across many-small and few-large sparse source sets.

In my opinion, it looks like there are too many allocations and there must be inefficiencies in the many-small sources scenario.

  many small / checksum    419 ms   0.24 MB/s      0.095 MiB, 20000 files
  many small / timestamp   144 ms   0.69 MB/s      0.095 MiB, 20000 files
  many small / none        0.71 ms

  few large / checksum    60.4 ms   8883 MB/s      512 MiB, 4 files
  few large / timestamp   0.23 ms   2330118 MB/s   512 MiB, 4 files
  few large / none        0.56 ms

The MB/s is not a IO rate (timestamp doesn't do IO). More like, a throughput comparison.

Add Issue go-task#2853 benchmarks comparing checksum, timestamp, and uncached tasks across many-small and few-large sparse YAML source sets.

Baseline on Intel i7-14700K, go test -run '^$' -bench 'BenchmarkIssue2853.*SparseYAMLFiles' -benchtime=3x -count=3 ./

Many small sparse YAML files (20,000 x 5 bytes): checksum 440-451 ms/op, timestamp 140-148 ms/op, none 1.1-1.3 ms/op.

Few large sparse YAML files (4 x 128 MiB): checksum 60-61 ms/op, timestamp 213-239 us/op, none 1.1-1.3 ms/op.

Sparse files avoid bulk data writes while preserving logical file size for checksum/timestamp comparisons.
@Napolitain

Copy link
Copy Markdown
Author

I suggest this benchmark for tracking speed for many small, and few large globs.

@trulede

trulede commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Would/could you add an OS Native benchmark too, using mtime. As a reference point.

Having the test profile code might also be useful. This function in particular:
https://github.com/go-task/task/blob/main/internal/fingerprint/sources_timestamp.go

Edit: Another point of reference (in addition to mtime) would be to generate a Makefile and run that over the files too.

@@ -0,0 +1,155 @@
package task_test

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a build tag here.

//go:build fsbench
// +build fsbench

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 295fea2

@trulede

trulede commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

@Napolitain If you want to try your luck and improve the performance, I "Asked AI" to make the code more efficient, and then again to see if the duplicate calls to os.Stat() could be improved. There is not much code there, so profiling or trial and error should find some improvement.

https://github.com/go-task/task/blob/main/internal/fingerprint/sources_timestamp.go

Strategy: globbing improved
package fingerprint

import (
	"os"
	"path/filepath"
	"time"

	"github.com/go-task/task/v3/taskfile/ast"
)

// TimestampChecker checks if any source change compared with the generated files,
// using file modifications timestamps.
type TimestampChecker struct {
	tempDir string
	dry     bool
}

func NewTimestampChecker(tempDir string, dry bool) *TimestampChecker {
	return &TimestampChecker{
		tempDir: tempDir,
		dry:     dry,
	}
}

// IsUpToDate implements the Checker interface
func (checker *TimestampChecker) IsUpToDate(t *ast.Task) (bool, error) {
	if len(t.Sources) == 0 {
		return false, nil
	}

	sources, err := Globs(t.Dir, t.Sources)
	if err != nil {
		return false, nil
	}

	// 1. Evaluate general glob lists immediately to avoid duplicate disk scans
	generates, err := Globs(t.Dir, t.Generates)
	if err != nil {
		return false, nil
	}

	// 2. Optimized Early Exit: If patterns exist but found no files, task must run
	if len(t.Generates) > 0 {
		hasPositivePattern := false
		for _, g := range t.Generates {
			if !g.Negate {
				hasPositivePattern = true
				break
			}
		}
		if hasPositivePattern && len(generates) == 0 {
			return false, nil
		}
	}

	timestampFile := checker.timestampFilePath(t)

	// 3. Check timestamp file existence
	_, err = os.Stat(timestampFile)
	if err == nil {
		generates = append(generates, timestampFile)
	} else {
		// Create the timestamp file for the next execution when it does not exist.
		if !checker.dry {
			if err := os.MkdirAll(filepath.Dir(timestampFile), 0o755); err != nil {
				return false, err
			}
			f, err := os.Create(timestampFile)
			if err != nil {
				return false, err
			}
			f.Close()
		}
	}

	taskTime := time.Now()

	// 4. FIX: Get the MINIMUM (oldest) time of the generates, not the max.
	// If any source is newer than our OLDEST output, the build is stale.
	generateMinTime, err := getMinTime(generates...)
	if err != nil || generateMinTime.IsZero() {
		return false, nil
	}

	// 5. Check if any source files are newer than our oldest generated file (Lazy execution)
	shouldUpdate, err := anyFileNewerThan(sources, generateMinTime)
	if err != nil {
		return false, nil
	}

	// Modify the metadata of the file to the current time.
	if !checker.dry {
		if err := os.Chtimes(timestampFile, taskTime, taskTime); err != nil {
			return false, err
		}
	}

	return !shouldUpdate, nil
}

func (checker *TimestampChecker) Kind() string {
	return "timestamp"
}

// Value implements the Checker Interface
func (checker *TimestampChecker) Value(t *ast.Task) (any, error) {
	sources, err := Globs(t.Dir, t.Sources)
	if err != nil {
		return time.Now(), err
	}

	sourcesMaxTime, err := getMaxTime(sources...)
	if err != nil {
		return time.Now(), err
	}

	if sourcesMaxTime.IsZero() {
		return time.Unix(0, 0), nil
	}

	return sourcesMaxTime, nil
}

// Added to track the oldest artifact constraint
func getMinTime(files ...string) (time.Time, error) {
	var minT time.Time
	for i, f := range files {
		info, err := os.Stat(f)
		if err != nil {
			return time.Time{}, err
		}
		modTime := info.ModTime()
		if i == 0 || modTime.Before(minT) {
			minT = modTime
		}
	}
	return minT, nil
}

func getMaxTime(files ...string) (time.Time, error) {
	var maxT time.Time
	for i, f := range files {
		info, err := os.Stat(f)
		if err != nil {
			return time.Time{}, err
		}
		modTime := info.ModTime()
		if i == 0 || modTime.After(maxT) {
			maxT = modTime
		}
	}
	return maxT, nil
}

// If the modification time of any of the files is newer than the given time, returns true.
// This function is lazy, as it stops when it finds a file newer than the given time.
func anyFileNewerThan(files []string, givenTime time.Time) (bool, error) {
	for _, f := range files {
		info, err := os.Stat(f)
		if err != nil {
			return false, err
		}
		if info.ModTime().After(givenTime) {
			return true, nil
		}
	}
	return false, nil
}

// OnError implements the Checker interface
func (*TimestampChecker) OnError(t *ast.Task) error {
	return nil
}

func (checker *TimestampChecker) timestampFilePath(t *ast.Task) string {
	return filepath.Join(checker.tempDir, "timestamp", normalizeFilename(t.Task))
}
Strategy: os.Stat calls improved
package fingerprint

import (
	"os"
	"path/filepath"
	"time"

	"github.com/go-task/task/v3/taskfile/ast"
)

// TimestampChecker checks if any source change compared with the generated files,
// using file modifications timestamps.
type TimestampChecker struct {
	tempDir string
	dry     bool
}

func NewTimestampChecker(tempDir string, dry bool) *TimestampChecker {
	return &TimestampChecker{
		tempDir: tempDir,
		dry:     dry,
	}
}

// IsUpToDate implements the Checker interface
func (checker *TimestampChecker) IsUpToDate(t *ast.Task) (bool, error) {
	if len(t.Sources) == 0 {
		return false, nil
	}

	sources, err := Globs(t.Dir, t.Sources)
	if err != nil {
		return false, nil
	}

	generates, err := Globs(t.Dir, t.Generates)
	if err != nil {
		return false, nil
	}

	if len(t.Generates) > 0 {
		hasPositivePattern := false
		for _, g := range t.Generates {
			if !g.Negate {
				hasPositivePattern = true
				break
			}
		}
		if hasPositivePattern && len(generates) == 0 {
			return false, nil
		}
	}

	timestampFile := checker.timestampFilePath(t)

	_, err = os.Stat(timestampFile)
	if err == nil {
		generates = append(generates, timestampFile)
	} else if !checker.dry {
		if err := os.MkdirAll(filepath.Dir(timestampFile), 0o755); err != nil {
			return false, err
		}
		f, err := os.Create(timestampFile)
		if err != nil {
			return false, err
		}
		f.Close()
	}

	taskTime := time.Now()

	// 1. Establish the absolute baseline boundary (the oldest generated asset)
	var minGenerateTime time.Time
	for i, g := range generates {
		info, err := os.Stat(g)
		if err != nil {
			return false, nil // Missing output asset forces a re-run
		}
		modTime := info.ModTime()
		if i == 0 || modTime.Before(minGenerateTime) {
			minGenerateTime = modTime
		}
	}

	// 2. Interleaved lazy verification check on sources
	// We run os.Stat sequentially and exit the instant a file is found to be stale.
	for _, s := range sources {
		info, err := os.Stat(s)
		if err != nil {
			return false, nil // Missing source file means target cannot be evaluated cleanly
		}
		// If ANY source file is newer than our oldest output asset, it's stale.
		if info.ModTime().After(minGenerateTime) {
			return false, nil
		}
	}

	if !checker.dry {
		if err := os.Chtimes(timestampFile, taskTime, taskTime); err != nil {
			return false, err
		}
	}

	return true, nil
}

func (checker *TimestampChecker) Kind() string {
	return "timestamp"
}

// Value implements the Checker Interface
func (checker *TimestampChecker) Value(t *ast.Task) (any, error) {
	sources, err := Globs(t.Dir, t.Sources)
	if err != nil {
		return time.Now(), err
	}

	var maxT time.Time
	for i, f := range sources {
		info, err := os.Stat(f)
		if err != nil {
			return time.Now(), err
		}
		if i == 0 || info.ModTime().After(maxT) {
			maxT = info.ModTime()
		}
	}

	if maxT.IsZero() {
		return time.Unix(0, 0), nil
	}
	return maxT, nil
}

func (*TimestampChecker) OnError(t *ast.Task) error {
	return nil
}

func (checker *TimestampChecker) timestampFilePath(t *ast.Task) string {
	return filepath.Join(checker.tempDir, "timestamp", normalizeFilename(t.Task))
}

@andreynering andreynering linked an issue Jun 12, 2026 that may be closed by this pull request
Add an OS-native mtime reference point for the Issue go-task#2853 filesystem benchmarks. The reference walks the same sparse YAML source tree with filepath.WalkDir, stats YAML files through DirEntry.Info, and compares mtimes against a generated output file.

The benchmark is available under the fsbench build tag alongside the Task checksum, timestamp, and uncached cases.
@Napolitain

Copy link
Copy Markdown
Author

Would/could you add an OS Native benchmark too, using mtime. As a reference point.

Having the test profile code might also be useful. This function in particular: https://github.com/go-task/task/blob/main/internal/fingerprint/sources_timestamp.go

Edit: Another point of reference (in addition to mtime) would be to generate a Makefile and run that over the files too.

addressed in ec19102 if I understood that part correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache is very slow

2 participants