-
Notifications
You must be signed in to change notification settings - Fork 23
CHAI 1.0 Design
- Implementations for various memory models are tightly coupled, complicating maintenance and making changes difficult
- Some memory models cannot be used in the same build (e.g. copy hiding and single memory space models cannot be used together)
- Build time configuration of CHAI is complicated and varies significantly by platform
- Modularize implementations for different memory models so that maintenance, understanding, and change are easier
- Allow all supported memory models to be used in the same build
- Simplify build time configuration of CHAI (should be able to remove variables like CHAI_DISABLE_RM, CHAI_ENABLE_UM, CHAI_ENABLE_PINNED, CHAI_THIN_GPU_ALLOCATE)
- Maintain or improve performance. In particular, avoid inheritance in device and other performance critical code
- Leave the design space open for shared_ptr-like semantics and container semantics
- Leave the design space open for user customization
-
Avoid inheritance in device and other performance critical code
- ManagedArray (or whatever classes are copied onto the device) should not inherit from a base class (with the possible exceptions of an empty base class for tagging, or CRTP). Device compilers often struggle to implement virtual functions correctly and optimize them.
- ManagedArray could contain a pointer to a base class or a type erased class that involves inheritance. That member would only be used in host code (e.g. to update coherence in the host copy constructor).
-
Maintain/improve performance
- Limit inheritance as mentioned above.
- Use camp::Resource instead of ExecutionSpace so that synchronization could be limited to streams instead of the whole device.
- ManagedArray::operator[] should be inlined, non-virtual, and a single array access
- Reduce the size of ManagedArray (currently has 8 members, and I think it could be reduced to 3). This would help as ManagedArray is designed to be passed by copy everywhere.
- Continue to optimize in the case where ManagedArray is read only.
-
API on device
- ManagedArray::size() and ManagedArray::operator[]. The copy constructor as well.
-
Discrete and unified memory implementations should be separate, but have the same interface.
- Option 1: Static polymorphism (i.e. CRTP)
- Option 2: Template parameter to ManagedArray (ManagedArray<ElementType, ArrayManagerType>)
- Option 3: Runtime polymorphism (ArrayManager is a base class, UnifiedArrayManager and DiscreteArrayManager would inherit from it)
- Option 4: Type erasure (ArrayManager is a type-erased container for any type that the necessary interface)
Instead of containing multiple memory management implementations, some of which are chosen at configuration time and others at run time, ManagedArray will delegate the memory management to its own instance of an ArrayManager. ManagedArray will contain an active pointer, a size, and a handle to an ArrayManager. This will decrease the size of ManagedArray, which is advantageous because it is designed to be passed around by copy.
In this design, ArrayManager is an abstract base class rather than a concrete type. It takes on the role of PointerRecord and parts of the role of the old ArrayManager class (basically everything except managing the execution context). If users desire customization, they can derive from this class.
This class replaces the "thin" implementation of ManagedArray. It is designed to be used in the case where there is a unified memory space (same address can be used across execution contexts) but synchronization is still required for coherence.
This class replaces the "main" implementation of ManagedArray. It is designed to be used in the case where there are discrete memory spaces, and copying between them is required for coherence.
The old ExecutionSpace enum grew in a confusing way to include both execution spaces and memory types. Also, the term "space" has become a little limiting. For performance reasons, it is important to be able to not only distinguish whether we are executing on a GPU, but which GPU stream is being used. That would allow CHAI to synchronize at the stream level rather than the whole device, which is a very expensive operation. ExecutionContext would work with camp::Resource objects, which would provide that more fine-grained control. It could also be made to work with the current ExecutionSpace enum in the short-term, choosing a default camp resource under the hood.
This class takes on some of the role of ArrayManager, namely managing the current execution context. It also keeps track of which execution contexts are unsynchronized. Concrete ArrayManager implementations communicate synchronization that has occurred or synchronization that is needed to ExecutionContextManager, which then performs the synchronization only if it is needed. This is much more performant than having every ArrayManager instance call cudaDeviceSynchronize()
, for example (e.g. if five ManagedArrays are copied into a RAJA sequential loop, we only want to synchronize the device once, not five times).
This class provides a convenient way of setting the execution context on construction, and restoring the previous execution context on destruction. It is a similar idea to std::lock_guard and would be used the same way, though it does not provide thread safety. Another type of guard could potentially be added to provide thread safety if needed. This class would not be used by the RAJA plugin, but could be helpful in application code when raw data needs to be extracted from a ManagedArray (such as passing a raw pointer to a function).
template <typename ElementType>
class ManagedArray {
public:
ManagedArray();
explicit ManagedArray(ArrayManager* manager);
CHAI_HOST_DEVICE ManagedArray(const ManagedArray& other);
// TODO: Add a constructor from a non-const ElementType
void resize(size_t newSize);
void free();
void update();
void cupdate();
CHAI_HOST_DEVICE size_t size() const;
CHAI_HOST_DEVICE ElementType* data() const;
CHAI_HOST_DEVICE const ElementType* cdata() const;
CHAI_HOST_DEVICE ElementType& operator[](size_t i) const;
ElementType get(size_t i) const;
void set(size_t i, const ElementType& value);
private:
ElementType* m_data;
size_t m_size;
ArrayManager* m_manager;
};
/*!
* \class ArrayManager
*
* \brief Controls the coherence of an array.
*/
template <typename ElementType>
class ArrayManager {
public:
/*!
* \brief Virtual destructor.
*/
virtual ~ArrayManager() = default;
/*!
* \brief Creates a clone of this ArrayManager.
*
* \return A new ArrayManager object that is a clone of this instance.
*/
virtual ArrayManager* clone() const = 0;
/*!
* \brief Resizes the array to the specified new size.
*
* \param newSize The new size to resize the array to.
*/
virtual void resize(std::size_t newSize) = 0;
/*!
* \brief Returns the size of the contained array.
*
* \return The size of the contained array.
*/
virtual std::size_t size() const = 0;
/*!
* \brief Updates the data to be coherent in the current execution context.
*
* \param data [out] A coherent array in the current execution context.
*/
virtual ElementType* data() = 0;
/*!
* \brief Returns the value at index i.
*
* Note: Use this function sparingly as it may be slow.
*
* \param i The index of the element to get.
* \return The value at index i.
*/
virtual ElementType get(std::size_t i) const = 0;
/*!
* \brief Sets the value at index i to the specified value.
*
* Note: Use this function sparingly as it may be slow.
*
* \param i The index of the element to set.
* \param value The value to set at index i.
*/
virtual void set(std::size_t i, ElementType value) = 0;
};
template <typename ElementType>
class UnifiedArrayManager {
public:
UnifiedArrayManager();
explicit UnifiedArrayManager(const umpire::Allocator& allocator);
UnifiedArrayManager(size_t size, const umpire::Allocator& allocator);
explicit UnifiedArrayManager(int umpireAllocatorID);
UnifiedArrayManager(size_t size, int umpireAllocatorID);
UnifiedArrayManager(const UnifiedArrayManager& other);
UnifiedArrayManager(UnifiedArrayManager&& other);
~UnifiedArrayManager();
UnifiedArrayManager& operator=(const UnifiedArrayManager& other);
UnifiedArrayManager& operator=(UnifiedArrayManager&& other);
void resize(size_t newSize);
void free();
size_t size() const;
ElementType* data();
const ElementType* data() const;
ElementType get(size_t i) const;
void set(size_t i, const ElementType& value);
private:
ElementType* m_data;
size_t m_size;
ExecutionContext m_modified;
umpire::Allocator m_allocator;
};