thsolver.tracker

Epoch-level metric accumulation and logging utilities.

class AverageTracker[source]

Tracks and logs averaged scalar tensors across iterations and epochs.

get_time()[source]

Returns the current synchronized wall-clock time.

update(value: Dict[str, Tensor])[source]

Update the tracker with the given value. This function is called at the end of each iteration.

record_time(num_iters: int = 1)[source]

Roughly records the elapsed time per iteration.

Parameters:

num_iters (int) – The number of iterations represented by the update.

average()[source]

Returns the averaged values accumulated in the tracker.

average_all_gather()[source]

Average the tensors on all GPUs using all_gather, which is called at the end of each epoch.

log(epoch: int, summary_writer: SummaryWriter | None = None, log_file: str | None = None, msg_tag: str = '->', notes: str = '', print_time: bool = True, print_memory: bool = False)[source]

Logs the average value to the console, TensorBoard, and a log file.

Parameters:
  • epoch (int) – The current epoch index.

  • summary_writer (SummaryWriter or None) – The TensorBoard writer.

  • log_file (str or None) – The CSV-like log file path.

  • msg_tag (str) – The prefix printed before the log line.

  • notes (str) – Extra notes appended to the log message.

  • print_time (bool) – If True, prints the timestamp and elapsed time.

  • print_memory (bool) – If True, prints the reserved CUDA memory.