* Tracing; WIP
* WIP towards all_reduce tuning.
* Add tracing support for collective operations and refactor all_reduce implementation
* wip
* Refactor all_reduce by removing monoid_broadcast and migrating shared tensor map utilities to a centralized module. Add tracing instrumentation for collective operations. Include minor performance tweaks and fixes across broadcast, reduce, and related modules.
* Refactor collective operations by introducing `PeerDeviceMap` and `get_peer_devices`. Simplify device handling in broadcast and all_reduce functions. Migrate shared logic to `tensor_map.rs` for improved reuse and clarity.
* Refactor all_reduce_sum_centralized return type to use `CollectiveTensorMap`. Remove unused `HashMap` dependency for cleaner imports.
* Update `tracing` dependency configurations and align features across crates
- Add `attributes` feature to `tracing` for enhanced functionality in multiple crates.
- Standardize inclusion of `tracing/std` in applicable feature sets.
- Fix minor typo in `reduce_timing` example documentation.
* Update `tracing-core` and `tracing-subscriber` configurations to enforce default features across crates
- Add and standardize `default` features for `tracing-core` and `tracing-subscriber`.
- Extend `tokio` with `tracing` feature in `burn-communication`.
- Align dependency feature sets for consistent tracing behavior.
* Add `tracing` and `tracing-core` dependencies with standardized feature configurations
- Include `tracing` and `tracing-core` in `Cargo.toml` with appropriate features across crates.
- Add `tracing/default` and `tracing-core/default` to feature sets in `burn`.
- Remove redundant `default` feature usage for `tracing-core` in `burn-train` and `burn-import`.
This is still broken for the no-std target:
```terminaloutput
$ cargo build --color always --no-default-features --target thumbv6m-none-eabi -p burn
```
* this doesnt work
* Remove portable-atomic dependencies and make tracing optional
Eliminates all uses of portable-atomic and portable-atomic-util from the workspace, including conditional dependencies for non-atomic pointer targets. Updates tracing and tracing-core dependencies to be optional in all affected crates, and adjusts feature flags to use dep:tracing and conditional tracing features. Also updates attribute macros in burn-autodiff to only use tracing instrumentation when the std feature is enabled.
* Add AllReduceOp and AllReduceResult imports to server.rs
Imported AllReduceOp and AllReduceResult in local/server.rs to support additional collective operations. This prepares the server module for handling all-reduce functionality.
* Reformat imports in server.rs for readability
Adjusted the formatting of the import statements in server.rs to improve readability and maintain consistency with Rust style guidelines. No functional changes were made.
* Refactor collective operations to generalize `WebSocket` to `Protocol` and improve shared logic reuse.
* Fix typo in error message: "missmatch" → "mismatch" in collective operations
* Refactor: Replace `send_err_to_all` with `fail` for improved error handling consistency in collective operations.
* Refactor `Op` to simplify `peer_devices` acquisition logic and reuse shared operations
* Refactor: Inline await expressions in `all_reduce` and `broadcast` operations; simplify `peer_devices` logic by removing redundant implementation.
* Refactor: Simplify `reduce`, `all_reduce`, and `broadcast` operations by replacing `into_iter` with `iter`, removing unused imports, and streamlining collection mechanisms
* Refactor: Move `effective_root` and `peers` methods from `BroadcastOp` to `BroadcastOpCall` to simplify struct design and enhance cohesion
* Refactor: Simplify `broadcast` operation by reordering logic, inlining expressions, and enhancing readability
* Refactor: Use `expect` for clearer error handling in `reduce` operation, inline variable for global strategy
* Refactor: Rename `reduce_timing` to `dop_timer`, modularize components into `workers`, `parsers`, and `event_utils`, and simplify event instrumentation logic.
* Refactor: Remove unused imports, streamline tracing setup, and simplify `WorkerHandle` interface by removing unused methods and variables
* Refactor: Relocate `WorkRequest` enum to `workers.rs` and remove unused import in `run` function
* Refactor: Remove redundant default values in `Args` struct definition
* Refactor: Simplify tensor reduction and broadcasting logic, streamline `peer_devices` handling, and remove unused imports
* readme/otel
* Simplify imports and std propagation.
* fmt
* rebase/fix