Initial readme - matmul.c - Matrix multiplication helper library

commit d946dcba01a220dcc6a7597075f6f10640eaaf5d
parent 660ab77993e2f82e88213aba539679c2411c4e09
Author: finwo <finwo@pm.me>
Date:   Thu, 16 Apr 2026 02:01:01 +0200

Initial readme

Diffstat:
M README.md  | 134 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 file changed, 134 insertions(+), 0 deletions(-)
diff --git a/README.md b/README.md
@@ -0,0 +1,134 @@
+# Matmul - High-Performance Matrix Multiplication Library
+
+A lightweight, type-safe matrix multiplication library with compile-time dispatch, scaling support, and infrastructure for SIMD acceleration.
+
+## Installation
+
+This library is installable through the [dep](https://github.com/finwo/dep) package manager. If using [dep-repository](https://github.com/finwo/dep-repository), installable through `dep add finwo/matmul`, or simply by adding the following line to your .dep file:
+
+```
+finwo/matmul https://git.finwo.net/lib/matmul.c/archives/heads/main.tar.gz
+```
+
+Alternatively, you can include the [matmul.c](src/matmul.c) and [matmul.h](src/matmul.h) files in your project directly.
+
+## Features
+
+- **64 Type Combinations**: Supports all combinations of f32/f64/i8/u8 for input matrices A, B and output C
+- **Compile-Time Dispatch**: Uses C11 `_Generic` for zero-overhead type detection
+- **Scale Parameter**: Divide output before writing (e.g., scale=64 for quantization emulation)
+- **Future SIMD Ready**: Infrastructure in place for AVX2/AVX512/VNNI acceleration
+- **No Dependencies**: Pure C11 implementation with no external libraries
+- **Well Tested**: 64 unit tests covering all type combinations
+
+## Quick Start
+
+```c
+#include "finwo/matmul.h"
+#include <stdio.h>
+
+int main() {
+    // 2x3 matrix A
+    float A[6] = {1, 2, 3, 4, 5, 6};
+    // 3x2 matrix B
+    float B[6] = {1, 0, 0, 1, 0, 0};
+    // 2x2 result matrix C
+    float C[4];
+
+    // Multiply A(2x3) * B(3x2) = C(2x2)
+    matmul(2, 3, 2, A, B, C, 0.0);  // scale=0: no scaling
+
+    // C should be [1, 2, 4, 5]
+    printf("C[0] = %f, C[1] = %f, C[2] = %f, C[3] = %f\n",
+           C[0], C[1], C[2], C[3]);
+
+    return 0;
+}
+```
+
+Compile with: `cc -o example example.c -lm`
+
+## API Reference
+
+### Generic Macro (Recommended)
+```c
+matmul(m, n, p, A, B, C, scale);
+```
+- Automatically selects the correct function based on types of A, B, and C
+- `scale`: Divide each output element by this value before writing (0 or 1 = no scaling)
+
+### Direct Function Calls
+Each of the 64 type combinations is available directly:
+```c
+// Floating point
+matmul_f32_f32_f32(m, n, p, A, B, C, scale);
+matmul_f32_f32_f64(m, n, p, A, B, C, scale);
+matmul_f32_f64_f32(m, n, p, A, B, C, scale);
+// ... etc for all 64 combinations
+
+// Integer types
+matmul_i8_i8_i8(m, n, p, A, B, C, scale);
+matmul_u8_u8_u8(m, n, p, A, B, C, scale);
+matmul_i8_u8_i8(m, n, p, A, B, C, scale);
+// ... etc
+```
+
+### Type Naming Conventions
+| Shorthand | C Type     | Description           |
+|-----------|------------|-----------------------|
+| `f32`     | `float`    | 32-bit floating point |
+| `f64`     | `double`   | 64-bit floating point |
+| `i8`      | `int8_t`   | Signed 8-bit integer  |
+| `u8`      | `uint8_t`  | Unsigned 8-bit integer|
+
+Function names follow pattern: `matmul_{A_type}_{B_type}_{C_type}`
+
+### Scale Parameter
+The `scale` parameter enables quantization-aware multiplication:
+- `scale = 0` or `scale = 1`: No scaling (write raw result)
+- `scale = 64`: Divide output by 64 before writing
+- Useful for emulating: full-scale A input with quantized B input representing -2..1.984375 instead of -128..127
+
+Implementation:
+```c
+if (scale != 0 && scale != 1) {
+    result = result / scale;
+}
+```
+
+## Building
+
+```bash
+# Compile library and tests
+make
+
+# Run tests
+./test_matmul
+
+# Clean build artifacts
+make clean
+```
+
+Requires a C11 compiler (gcc, clang, MSVC).
+
+## Testing
+
+The library includes 64 unit tests covering all type combinations:
+- Run: `./test_matmul`
+- Tests verify correctness against reference implementations
+- Output shows PASS/FAIL status for each type combination
+
+## Future Work
+
+SIMD acceleration infrastructure is already in place:
+- Auto-dispatch functions (`_matmul_*`) replace function pointers on first call
+- Ready for AVX2, AVX512, AVX512-VNNI, and AVX-VNNI implementations
+- When implemented, dispatch will select best available CPU features at runtime
+- Fallback chain: AVX512-VNNI → AVX512 → AVX2 → Scalar
+
+## License
+
+Licensed under custom terms (Copyright 2026 finwo); see LICENSE.md for full details.
+
+---
+*Built with C11. Zero runtime overhead for type dispatch.*

	matmul.c Matrix multiplication helper library
	git clone git://git.finwo.net/lib/matmul.c
	Log \| Files \| Refs \| README \| LICENSE