[FEAT] Adding vecdot implementation#86
Conversation
|
This PR also adds some helpers and a new ufunc registering helper function, which will be use for register |
ngoldbaum
left a comment
There was a problem hiding this comment.
Overall looks good, a few comments below.
| #ifndef DISABLE_QUADBLAS | ||
|
|
||
| PyType_Slot slots[] = { | ||
| {NPY_METH_resolve_descriptors, (void *)&quad_matmul_resolve_descriptors}, |
There was a problem hiding this comment.
Claude points out that quad_matmul_resolve_desctriptors includes an error message that references matmul. It needs to be generalized to use the ufunc name rather than hardcode it:
numpy-quaddtype/src/csrc/umath/matmul.cpp
Lines 38 to 39 in fc52921
Also because resolve_descriptors errors out for non-SLEEF backends, longdouble will never actually be able to call the naive loops. This is a pre-existing issue but it's copied here.
| if (descr->backend != BACKEND_SLEEF) { | ||
| PyErr_SetString(PyExc_NotImplementedError, | ||
| "QBLAS-accelerated vecdot only supports SLEEF backend."); | ||
| return -1; |
There was a problem hiding this comment.
this check is unnecessary because resolve_descriptors would have already triggered the same error.
| Sleef_quad a_val, b_val; | ||
| memcpy(&a_val, x + k * x_n_stride, sizeof(Sleef_quad)); | ||
| memcpy(&b_val, y + k * y_n_stride, sizeof(Sleef_quad)); | ||
| sum = Sleef_fmaq1_u05(a_val, b_val, sum); |
There was a problem hiding this comment.
unaligned matmul does a different thing and calls into qblas_dot:
numpy-quaddtype/src/csrc/umath/matmul.cpp
Lines 234 to 248 in fc52921
Why the difference? Claude seems to think the matmul approach has better numerical accuracy.
| x = create_quad_array([1, 2]) | ||
| y = create_quad_array([1, 2, 3]) | ||
| with pytest.raises(ValueError): | ||
| np.vecdot(x, y) |
There was a problem hiding this comment.
might be worth adding a test that does vecdot on an empty array, e.g. np.vecdot(create_quad_array([]), create_quad_array([])).
|
Thanks @ngoldbaum , I was thinking to re-patch this and #88 after #95 |
Implementing NumPy's vecdot for quaddtype
AI declaration:
Claude is used to extend
test_dot.pyfor adding test relatedvecdotand possible edge cases