Added new doc site

2024-04-24 16:44:52 -05:00
parent 1424c40384
commit 5b8e922a70
20 changed files with 298 additions and 42 deletions
--- a/docs/02
+++ b/docs/02
@@ -1,19 +0,0 @@
-# GraphTensors
-
-We're working with pretty complicated graphs to build our computation on, but we don't want to manually place all the nodes ourselves! So how can we build these static graphs in a nice, familiar way? GraphTensors!
-
-Essentially GraphTensors are pointers to a specific node on the graph, as well as some metadata about the output of that node, such as its shape. We can make a new GraphTensor by doing:
-```rust
-let mut cx = Graph::new(); // We need a graph to build!
-let a: GraphTensor<R1<3>> = cx.tensor(); // Here we create a new node on the graph and get a GraphTensor back, pointing to it.
-```
-Notice the type of `a`: `GraphTensor<R1<3>>`. So what's that generic all about? It's the shape! We make tensor shapes part of the type, so they're tracked at compile time! In this case, the shape is rank 1, with 3 elements, or in other words, a vector of 3 dimensions. (Side note: `R1<N>` is a typedef of `(Const<N>,)`) It should be impossible to accidentally get a runtime shape mismatch.
-
-Now we can use the `a` as you would in a library like PyTorch, performing linear algebra:
-```rust
-let b = a.exp().sqrt();
-let c = b + a;
-```
-Looks familiar!
-
-[Let's take a look at how GraphTensors are used to build whole neural networks.](https://github.com/jafioti/luminal/blob/main/docs/03%20Modules.md)
--- a/Serialization.md
+++ b/Serialization.md
@@ -1 +0,0 @@
-Coming Soon
--- a/Compilers.md
+++ b/Compilers.md
@@ -1 +0,0 @@
-Coming Soon
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,4 @@
+```
+npm i -g mintlify
+mintlify dev
+```
--- a/docs/blog/4-24-2024.mdx
+++ b/docs/blog/4-24-2024.mdx
@@ -0,0 +1,48 @@
+---
+title: 'Luminal: Efficient ML in Rust through graph compilation'
+description: 'A new approach to ML'
+---
+![](https://raw.githubusercontent.com/jafioti/luminal/main/dag.jpeg)
+
+**Luminal is a deep learning library that uses composable compilers to achieve high performance.**
+
+Current ML libraries tend to be large and complex because they try to map high level operations directly on to low level handwritten kernels, and focus on eager execution. Libraries like PyTorch contain hundreds of thousands of lines of code, making it nearly impossible for a single programmer to understand it all, set aside do a large refactor.
+
+But does it need to be so complex? ML models tend to be static dataflow graphs made up of a few simple operators. This allows us to have a dirt simple core only supporting a few primitive operations, and use them to build up complex neural networks. We can then write compilers that modify the graph after we build it, to swap more efficient ops back in depending on which backend we're running on.
+
+Luminal takes this approach to the extreme, supporting only 11 primitive operations (primops):
+- **Unary** - Log2, Exp2, Sin, Sqrt, Recip
+- **Binary** - Add, Mul, Mod, LessThan
+- **Other** - SumReduce, MaxReduce, Contiguous
+
+Every complex operation boils down to these primitive operations, so when you do `a - b` for instance, `add(a, mul(b, -1))` gets written to the graph. Or when you do `a.matmul(b)`, what actually gets put on the graph is `sum_reduce(mul(reshape(a), reshape(b)))`.
+
+Once the graph is built, iterative compiler passes can modify it to replace primops with more efficient ops, depending on the device it's running on. On Nvidia cards, for instance, efficient Cuda kernels are written on the fly to replace these ops, and specialized cublas kernels are swapped in for supported operations.
+
+This approach leads to a simple library, and performance is only limited by the creativity of the compiler programmer, not the model programmer.
+
+Luminal has a number of other neat features, check out the repo [here](https://github.com/jafioti/luminal).
+
+## Welcome
+
+There are two ways to build API documentation: [OpenAPI](https://mintlify.com/docs/api-playground/openapi/setup) and [MDX components](https://mintlify.com/docs/api-playground/mdx/configuration). For the starter kit, we are using the following OpenAPI specification.
+
+<Card
+  title="Plant Store Endpoints"
+  icon="leaf"
+  href="https://github.com/mintlify/starter/blob/main/api-reference/openapi.json"
+>
+  View the OpenAPI specification file
+</Card>
+
+## Authentication
+
+All API endpoints are authenticated using Bearer tokens and picked up from the specification file.
+
+```json
+"security": [
+  {
+    "bearerAuth": []
+  }
+]
+```
--- a/docs/blog/endpoint/create.mdx
+++ b/docs/blog/endpoint/create.mdx
@@ -0,0 +1,4 @@
+---
+title: 'Create Plant'
+openapi: 'POST /plants'
+---
--- a/docs/blog/endpoint/delete.mdx
+++ b/docs/blog/endpoint/delete.mdx
@@ -0,0 +1,4 @@
+---
+title: 'Delete Plant'
+openapi: 'DELETE /plants/{id}'
+---
--- a/docs/blog/endpoint/get.mdx
+++ b/docs/blog/endpoint/get.mdx
@@ -0,0 +1,4 @@
+---
+title: 'Get Plants'
+openapi: 'GET /plants'
+---
--- a/docs/developers/introduction.mdx
+++ b/docs/developers/introduction.mdx
@@ -1,10 +1,22 @@
-# Contributing to luminal
-![image](https://raw.githubusercontent.com/jafioti/luminal/main/resources/dag.jpeg)
+---
+title: Developing Luminal
+description: 'Building the future of ML.'
+icon: 'hand-wave'
+---

-Please take a look at the [issues](https://github.com/jafioti/luminal/issues) and [roadmap](https://github.com/users/jafioti/projects/1) to see what's targeted for upcoming releases. Contributions for those features are preferred and will be reviewed and merged very rapidly. Other contributions are welcome, but please note luminal is and always will be a fairly minimal library.
+<img
+  className="block dark:hidden rounded-xl"
+  src="/images/abstract_light.jpg"
+  alt="Hero Light"
+/>
+<img
+  className="hidden dark:block rounded-xl"
+  src="/images/abstract.jpg"
+  alt="Hero Dark"
+/>

-The core design of luminal is heavily predicated on extensibility. Compilers alow for immense complexity to be removed from the core library and added with third party compilers. For instance, datatypes and devices are typically first class primitives. In luminal, they're compilers and the core has no idea about them. This is the general trend we'll stick to: core remains brutally simple, and everything that can be externalized to a compiler will be.
+Please take a look at the [issues](https://github.com/jafioti/luminal/issues) and [roadmap](https://github.com/users/jafioti/projects/1) to see what's targeted for upcoming releases. Contributions for those features are preferred and will be reviewed and merged very rapidly. Other contributions are welcome, but please note Luminal is and always will be a fairly minimal library.

-We will be adding training support soon, and as you guessed, it will entirely reside in a compiler. Just define the model's graph, run the output through an optimizer, and then run the `AutogradCompiler` before any other compilers. Boom, we got training, and the core of the library has no idea! (aside from some quality of life apis)
+The core design of Luminal is heavily predicated on extensibility. Compilers alow for immense complexity to be removed from the core library and added with third party compilers. For instance, datatypes and devices are typically first class primitives. In Luminal, they're compilers and the core has no idea about them. This is the general trend we'll stick to: core remains brutally simple, and everything that can be externalized to a compiler will be.

-PRs that remove complexity are always welcome, but note that line count often is a bad proxy for complexity. Ideally the entire luminal core should be a few thousand lines of code, but anything remotely resembling code golf is not allowed.
+PRs that remove complexity are always welcome, but note that line count often is a bad proxy for complexity. Ideally the entire Luminal core should be a few thousand lines of code, but anything remotely resembling code golf is not allowed.
--- a/docs/docs/compilers.mdx
+++ b/docs/docs/compilers.mdx
@@ -1,17 +1,22 @@
-# Compilers
+---
+title: Compilers
+description: 'Core transformations of the computation graph.'
+icon: 'microchip'
+---

 So now we have our graph all set up. We did our forward passes through the model, so now what? Do we run it?

 We could! But it wouldn't be very fast. Right now your graph is full of **primops**, which are the simplest set of primitive operations in luminal. One of the key tenants of luminal is a small primop set, which makes it easy to add new backends and write compilers for. But another consequence of a small primset is that even simple operations usually end up creating quite a few operations, and even small neural networks can end up with hundreds or thousands of primops, which are slow to run directly. So it's time to compile the graph!

-Compilers are structs that implement the `Compiler` trait, which simply specifies a single function:
+We use a loose definition of a compiler. Compilers are structs that implement the `Compiler` trait, which simply specifies a single function:
 ```rust
 pub trait Compiler {
+    type Output = ();
    /// Run a compilation pass
-    fn compile<T: ToIdsMut>(&self, graph: &mut Graph, remap: T);
+    fn compile<T: ToIdsMut>(&self, graph: &mut Graph, remap: T) -> Self::Output;
 }
 ```
-So all a compiler does is take a mutable reference to the graph, something called remap (beyond the scope of this introduction), and does something to the graph. That something is compilation, usually in the form of finding patterns of nodes and replacing them with other nodes. For instance, there's no Subtract operation in the primops, so subtractions are implemented as `add(a, mul(b, -1))`. We can have a compiler that looks for that pattern of nodes and directly replaces it with a `Subtract` operation. We'll look at how to do this in the [Writing Compilers](https://github.com/jafioti/luminal/blob/main/docs/06%20Writing%20Compilers.md) section.
+So all a compiler does is take a mutable reference to the graph, something called remap (beyond the scope of this introduction), and does something to the graph. That something is compilation, usually in the form of finding patterns of nodes and replacing them with other nodes. For instance, there's no Subtract operation in the primops, so subtractions are implemented as `add(a, mul(b, -1))`. We can have a compiler that looks for that pattern of nodes and directly replaces it with a `Subtract` operation. We'll look at how to do this in the [Writing Compilers](/developers/compilers) section.

 All you need to know for now is that we can use this compiler on the graph by doing:
 ```rust
@@ -19,9 +24,7 @@ cx.compile(SubtractionCompiler::default());
 ```
 Now the graph will have the old mul + add pattern removed and Subtract ops placed in. There are plenty of different compilers for different purposes. Some of the popular ones:
 - GenericCompiler - A handful of hardware-agnostic optimizations like [CSE](https://en.wikipedia.org/wiki/Common_subexpression_elimination) to be ran before any hardware-specific compilers.
- CudaCompiler<T> - The full stack of cuda compilers to convert a graph to a cuda-specialized graph with T as the datatype (either f32 or f16). Imported from luminal_cuda
- MetalCompiler<T> - Same as CudaCompiler. Imported from luminal_metal
+- CudaCompiler\<T\> - The full stack of cuda compilers to convert a graph to a cuda-specialized graph with T as the datatype (either f32 or f16). Imported from luminal_cuda.
+- MetalCompiler\<T\> - Same as CudaCompiler. Imported from luminal_metal.

-Compilers are entirely seperate from luminal, so they can be fully implemented by third party crates. For instance, everything specific to Cuda is contained in luminal_cuda.
-
-[Now let's look into how to load weights from a file.](https://github.com/jafioti/luminal/blob/main/docs/05%20Serialization.md)
+Compilers are entirely seperate from luminal, so they can be fully implemented by third party crates. For instance, everything specific to Cuda is contained in luminal_cuda.
--- a/docs/docs/graphtensor.mdx
+++ b/docs/docs/graphtensor.mdx
@@ -1,5 +1,10 @@
-# Luminal Introduction
+---
+title: GraphTensor API
+description: 'The high-level interface for writing ML code, checked at compile time.'
+icon: 'webhook'
+---

+## Familiarizing ourselves
 Let's get up to speed with how to use luminal, and how it works internally.

 First we'll take a look at what the simplest program will look like:
@@ -35,4 +40,22 @@ Then we set the data for these tensors. But if `GraphTensor` doesn't hold data,

 Alright, that was a lot but now we've touched on all the main aspects of running a model in luminal.

-[Let's take a look at each piece in more depth.](https://github.com/jafioti/luminal/blob/main/docs/02%20GraphTensor%20API.md)
+## GraphTensors
+
+We're working with pretty complicated graphs to build our computation on, but we don't want to manually place all the nodes ourselves! So how can we build these static graphs in a nice, familiar way? GraphTensors!
+
+Essentially GraphTensors are pointers to a specific node on the graph, as well as some metadata about the output of that node, such as its shape. We can make a new GraphTensor by doing:
+```rust
+let mut cx = Graph::new(); // We need a graph to build!
+let a: GraphTensor<R1<3>> = cx.tensor(); // Here we create a new node on the graph and get a GraphTensor back, pointing to it.
+```
+Notice the type of `a`: `GraphTensor<R1<3>>`. So what's that generic all about? It's the shape! We make tensor shapes part of the type, so they're tracked at compile time! In this case, the shape is rank 1, with 3 elements, or in other words, a vector of 3 dimensions. (Side note: `R1<N>` is a typedef of `(Const<N>,)`) It should be impossible to accidentally get a runtime shape mismatch.
+
+Now we can use the `a` as you would in a library like PyTorch, performing linear algebra:
+```rust
+let b = a.exp().sqrt();
+let c = b + a;
+```
+We just placed some ops on the graph! It doesn't look like it because you don't need to think about the graph while writing ML code.
+
+Next we'll see how GraphTensors are used to build whole neural networks.
--- a/docs/docs/introduction.mdx
+++ b/docs/docs/introduction.mdx
@@ -0,0 +1,67 @@
+---
+title: Introduction
+description: 'Welcome to a new way to do ML.'
+icon: 'hand-wave'
+---
+
+<img
+  className="block dark:hidden rounded-xl"
+  src="/images/abstract_light.jpg"
+  alt="Hero Light"
+/>
+<img
+  className="hidden dark:block rounded-xl"
+  src="/images/abstract.jpg"
+  alt="Hero Dark"
+/>
+
+Luminal is a new machine learning framework focused on **speed**, **simplicity** and **composability**. We take a new approach to ML by focusing on static graphs and leaning heavily on compilers.
+
+## Contents
+
+Navigate around the Luminal docs.
+
+<CardGroup cols={2}>
+  <Card
+    title="Quickstart"
+    icon="bolt"
+    href="/docs/quickstart"
+  >
+    Get up and running ML models in a flash.
+  </Card>
+  <Card
+    title="Why Luminal"
+    icon="lightbulb"
+    href="/docs/why"
+  >
+    Dive into why Luminal was created and the design philosophy behind it.
+  </Card>
+  <Card
+    title="GraphTensor API"
+    icon="webhook"
+    href="/docs/graphtensor"
+  >
+    High-level interface for building models.
+  </Card>
+  <Card
+    title="Modules"
+    icon="shapes"
+    href="/docs/modules"
+  >
+    Composable building blocks of complex neural networks.
+  </Card>
+  <Card
+    title="Compilers"
+    icon="microchip"
+    href="/docs/compilers"
+  >
+    Core transformations of the computation graph.
+  </Card>
+  <Card
+    title="Developers"
+    icon="code"
+    href="/docs/developers"
+  >
+    Resources for contributors and future development.
+  </Card>
+</CardGroup>
--- a/docs/docs/modules.mdx
+++ b/docs/docs/modules.mdx
@@ -1,4 +1,9 @@
-# NN Modules
+---
+title: Modules
+description: 'Composable building blocks of complex neural networks.'
+icon: 'shapes'
+---
+
 Like any good DL library, we organize our networks into `Module`s. Here is the module trait:
 ```rust
 /// A module with a forward pass
@@ -26,6 +31,4 @@ impl<const A: usize, const B: usize> Module<GraphTensor<R1<A>>> for Linear<A, B>
 ```
 Here we see a single weight matrix as the internal state, of size AxB. We've written a single forward function for single input vectors of shape (A,) and matmul it by our weight matrix to get an output of shape (B,).

-Now all of these ops are recorded on the graph, to be compiled and ran later on.
-
-[So how does this compilation work? Let's find out!](https://github.com/jafioti/luminal/blob/main/docs/04%20Compilers.md)
+Now all of these ops are recorded on the graph, to be compiled and ran later on.
--- a/docs/docs/quickstart.mdx
+++ b/docs/docs/quickstart.mdx
@@ -0,0 +1,39 @@
+---
+title: 'Quickstart'
+description: 'Start running ML models in minutes.'
+icon: 'bolt'
+---
+
+## Clone the repo
+
+Clone the codebase locally by running the following:
+```bash
+git clone https://github.com/jafioti/luminal
+cd luminal
+```
+
+## Hello World
+
+Simple examples demonstrate how a library works without diving in too deep. Run your first Luminal code like so:
+```bash
+cd ./examples
+cargo run --release
+```
+Great! You've ran your first Luminal model!
+
+## Run Llama 3
+
+Run the following to start generating text with Llama 3 8B:
+```bash
+cd ./examples/llama
+# Download the model
+bash ./setup/setup.sh
+# Run the model
+cargo run --release --features metal    # MacOS (Recommended)
+cargo run --release --features cuda     # Nvidia
+cargo run --release                     # CPU
+```
+
+<Warning>
+  Luminal currently isn't well optimized for CPU usage, so running large models like Llama 3 on CPU isn't recommended.
+</Warning>
--- a/docs/docs/why.mdx
+++ b/docs/docs/why.mdx
@@ -0,0 +1,66 @@
+---
+title: 'Why Luminal'
+description: 'ML is a crowded landscape. What makes Luminal different?'
+icon: 'lightbulb'
+---
+
+## The ML ecosystem is fragmented
+
+In recent years, ML has seen a flourishing of interest, especially after apps like ChatGPT gained huge traction. With this interest has come many fantastic open source projects and libraries lowering the barrier to entry.
+
+But despite all the effort, it still feels hard to take an existing model and deploy it to a new environment without jumping through hoops.
+
+#### Deployment
+ML deployments usually come in one of two flavors: extensions to training libraries, and specialized deployment libraries.
+
+PyTorch and JAX exemplify the current mainstream of training libraries. While there exist great deployment systems for these, typically they involve either trying to ship a standalone Python interpreter, or exporting the model to another library.
+
+ONNX-based runtimes represent the standard in dedicated deployment libraries. Once you get the model into a supported format, like ONNX, deployment to your chosen environment is fairly easy.
+
+#### Devices x Datatypes x Operations
+On top of this, frameworks are only usually able to support a handful of devices, since implementing a device involves implementing every operation the framework supports. Throw in datatypes and the amount of code needed grows exponentially.
+
+When faced with all of this, it's no wonder ML developers usually just opt for the cloud, an environment they can have full control over.
+
+## A better way
+
+Luminal was borne out of this frustration, and a want to deploy to user devices with the same piece of mind Rust developers are used to. It turns out most of these problems were already solved in the early days of computing.
+
+Why don't developers today hand-write assembly code? Why does code written on one machine work on all others? Do developers need to think about the differences between x86 and ARM ISA's? Of course not.
+
+Let's learn the same lesson in ML. If you want to know how something is achieved in Luminal, there's a good chance the answer is the same: **compilers**.
+
+## It's compilers all the way down
+
+How simple *could* an ML library get? Surely after you made a linear algebra library you'd need to deal with datatypes, devices, backprop, and all the usual list of ML concerns, right? What if you could throw all those things away and just worry about doing the minimum to support arbitrary neural networks?
+
+It turns out, it can get extremely simple. The core of Luminal is a few thousand lines of code and only 11 operations, which allows anyone to understand the whole thing in an afternoon.
+
+But wouldn't that make your library so limited it's useless? **No!** Not if you can use compilers to add functionality back, in a composable, isolated way.
+
+Let's see what we can do.
+
+#### Devices
+Since devices aren't handled by the core library, what if we had a compiler take each op present in the network and swap it out with equivalent operations on other devices, like CUDA GPUs? Or TPUs? Or quantum photonic retro-encabulators?
+
+If you only have 11 ops, it's extremely straightforward. We can also have the compilers insert copy-to-device and copy-from-device ops so our data is moved correctly without us thinking about it.
+
+So compilers get us support for other devices.
+
+#### Datatypes
+We want more than just fp32. If you tilt your head and squint, other datatypes are the same as other devices. It's just another seperate set of ops that processes your tensors slightly differently. So we can have a compiler insert the ops that support our desired datatype, and insert conversion to and from fp32 ops.
+
+So we get datatypes back as well, through compilers.
+
+#### Training
+Whether or not a library will support training is one of the first decisions a developer makes when starting out. So surely, if the core of luminal doesn't support training, there's no way it'll be added in externally, right?
+
+Nope! Compilers to the rescue again. With a limited op set, we can easily handle all possible cases of operations and derive the local gradients to get a full backward graph, and then connect it to the existing forward graph.
+
+Boom! We now have access to gradients! With a few more convenience functions, we can use those gradients to update the model's weights. Training has arrived!
+
+## In conclusion
+
+By now you should be seeing a trend. Everything we've removed from the core library we can add back in with external compilers. But now all that functionality is external to the core, hackable, and isolated. You can use the Autograd compiler with the CudaFp16 compiler (or any other device / datatype compiler) and be confident it will Just Work™.
+
+In the coming months you can expect to see advanced features like full 3D-parallel training, low-bit quantizations, and RL coming to Luminal, by way of external crates. Which means if you want to add something big, you probably can do it by writing your own compiler!
--- a/docs/favicon.png
+++ b/docs/favicon.png
--- a/docs/images/abstract.jpg
+++ b/docs/images/abstract.jpg
--- a/docs/images/abstract_light.jpg
+++ b/docs/images/abstract_light.jpg
--- a/docs/logo/luminal_logo.png
+++ b/docs/logo/luminal_logo.png
--- a/docs/logo/luminal_logo_light.png
+++ b/docs/logo/luminal_logo_light.png