mirror of
https://github.com/tracel-ai/burn.git
synced 2026-05-31 19:49:48 +09:00
* Tracing; WIP * WIP towards all_reduce tuning. * Add tracing support for collective operations and refactor all_reduce implementation * wip * Refactor all_reduce by removing monoid_broadcast and migrating shared tensor map utilities to a centralized module. Add tracing instrumentation for collective operations. Include minor performance tweaks and fixes across broadcast, reduce, and related modules. * Refactor collective operations by introducing `PeerDeviceMap` and `get_peer_devices`. Simplify device handling in broadcast and all_reduce functions. Migrate shared logic to `tensor_map.rs` for improved reuse and clarity. * Refactor all_reduce_sum_centralized return type to use `CollectiveTensorMap`. Remove unused `HashMap` dependency for cleaner imports. * Update `tracing` dependency configurations and align features across crates - Add `attributes` feature to `tracing` for enhanced functionality in multiple crates. - Standardize inclusion of `tracing/std` in applicable feature sets. - Fix minor typo in `reduce_timing` example documentation. * Update `tracing-core` and `tracing-subscriber` configurations to enforce default features across crates - Add and standardize `default` features for `tracing-core` and `tracing-subscriber`. - Extend `tokio` with `tracing` feature in `burn-communication`. - Align dependency feature sets for consistent tracing behavior. * Add `tracing` and `tracing-core` dependencies with standardized feature configurations - Include `tracing` and `tracing-core` in `Cargo.toml` with appropriate features across crates. - Add `tracing/default` and `tracing-core/default` to feature sets in `burn`. - Remove redundant `default` feature usage for `tracing-core` in `burn-train` and `burn-import`. This is still broken for the no-std target: ```terminaloutput $ cargo build --color always --no-default-features --target thumbv6m-none-eabi -p burn ``` * this doesnt work * Remove portable-atomic dependencies and make tracing optional Eliminates all uses of portable-atomic and portable-atomic-util from the workspace, including conditional dependencies for non-atomic pointer targets. Updates tracing and tracing-core dependencies to be optional in all affected crates, and adjusts feature flags to use dep:tracing and conditional tracing features. Also updates attribute macros in burn-autodiff to only use tracing instrumentation when the std feature is enabled. * Add AllReduceOp and AllReduceResult imports to server.rs Imported AllReduceOp and AllReduceResult in local/server.rs to support additional collective operations. This prepares the server module for handling all-reduce functionality. * Reformat imports in server.rs for readability Adjusted the formatting of the import statements in server.rs to improve readability and maintain consistency with Rust style guidelines. No functional changes were made. * Refactor collective operations to generalize `WebSocket` to `Protocol` and improve shared logic reuse. * Fix typo in error message: "missmatch" → "mismatch" in collective operations * Refactor: Replace `send_err_to_all` with `fail` for improved error handling consistency in collective operations. * Refactor `Op` to simplify `peer_devices` acquisition logic and reuse shared operations * Refactor: Inline await expressions in `all_reduce` and `broadcast` operations; simplify `peer_devices` logic by removing redundant implementation. * Refactor: Simplify `reduce`, `all_reduce`, and `broadcast` operations by replacing `into_iter` with `iter`, removing unused imports, and streamlining collection mechanisms * Refactor: Move `effective_root` and `peers` methods from `BroadcastOp` to `BroadcastOpCall` to simplify struct design and enhance cohesion * Refactor: Simplify `broadcast` operation by reordering logic, inlining expressions, and enhancing readability * Refactor: Use `expect` for clearer error handling in `reduce` operation, inline variable for global strategy * Refactor: Rename `reduce_timing` to `dop_timer`, modularize components into `workers`, `parsers`, and `event_utils`, and simplify event instrumentation logic. * Refactor: Remove unused imports, streamline tracing setup, and simplify `WorkerHandle` interface by removing unused methods and variables * Refactor: Relocate `WorkRequest` enum to `workers.rs` and remove unused import in `run` function * Refactor: Remove redundant default values in `Args` struct definition * Refactor: Simplify tensor reduction and broadcasting logic, streamline `peer_devices` handling, and remove unused imports * readme/otel * Simplify imports and std propagation. * fmt * rebase/fix
755 B
755 B
dop_timer
This binary exists to time the behavior of distributed (local, global) collective operations.
This binary uses the gRPC OTEL exporter to send traces to an OTEL Collector on port 4317.
Example
- Setup an OTEL Collector
There are many ways to do this; one of the simplest is to use the jaegertracing/all-in-one:latest docker image:
$ docker run -e OTEL_TRACES_SAMPLER=always_off -e COLLECTOR_OTLP_ENABLED=true -p 16686:16686 -p 4317-4318:4317-4318 -p 14250:14250 -p 14268:14268 -p 14269:14269 jaegertracing/all-in-one:latest
Then navigate to localhost:16686 to view traces.
- Run the binary, with the OTEL Collector endpoint as an argument:
$ cargo run -p dop_timer --features cuda -- --tracing otel