Change log#
The change log file hosting all releases with lists of new features and breaking changes. Best viewed here.
Unreleased#
New features:
Adds support for filtering Grain-internal stack frames from user-thrown errors.
Breaking changes:
Deprecations:
Deprecates
grain.python.experimental.MultiprocessPrefetchIterDataset, use the graduated version instead:grain.IterDataset.mp_prefetch.
Bug fixes:
Grain 0.2.15 (November 25, 2025)#
New features:
Adds public
DatasetIterator.closeAPI as an explicit blocking alternative to a cleanup during GC.DatasetIterator.start_prefetchnow propagates to the first asynchronous parent iterator instead of raisingNotImplemented. This API is useful for hiding first batch processing behind model checkpoint recovery.Introduces
grain.experimental.multithread_prefetchas an alternative to multiprocessing prefetch in free-threading Python.Adds experimental support for static
{Map|Iter}Datasetelement specification inference.Adds support for changing
IterDataset.mixcomponents and weights after a checkpoint.
Grain 0.2.14 (October 30, 2025)#
New features
Adds Python 3.14 build.
Replaces
dm-treedependency with pure Python implementation for Pytree manipulation. Note that it’s only used ifjaxis not installed. Ifjaxcan be imported – usesjax.tree_utilinstead.
Breaking changes:
Removes
grain[testing]PyPi build. It is an implementation detail and should not be publicly visible.Upgrades linux wheels to
manylinux_2_28.
Deprecations:
Deprecates Python 3.10 support.
Deprecates
grain.python.experimental.visualize_dataset. Use visualization mode instead.
Grain 0.2.13 (October 15, 2025)#
New features
Adds
reseed_each_epochoption toMapDataset.repeatthat allows to replay the first epoch exactly if set to False (True by default).Introduces
grain.experimental.RebatchIterDatasetfor efficient rebatch.Migrates data loader to use dataset API under the hood.
Improves first-fit packing speed by up to 12x.
Adds best-fit packing implementation which reduces padding in benchmarks by over 27% compared to first-fit.
Adds
max_sequences_per_binto packing transformations to limit the number of sequences packed into a single bin.Introduces
grain.experimental.RepeatIterDataset.Adds custom batching function support to
grain.DataLoader.Adds
grain.experimental.FlatMapTransformsupport tograin.DataLoader.Introduces
grain.experimental.CacheIterDatasetfor caching parent dataset.
Breaking changes:
SliceMapDataset updated to use the full index relative to the parent dataset, instead index%len(self).
Deprecations:
Graduates
grain.experimental.apply_transformationstograin.{MapDataset|IterDataset}.apply. The experimental API will soon be deprecated.
Bug fixes
Fixes memory leak on
ThreadPrefetchDatasetIteratordeletion.
Grain 0.2.12 (August 21, 2025)#
New features:
Adds Windows build.
Allow passing
read_kwargstoParquetIterDatasetfor configuring parquet file reading.ThreadPrefetchDatasetIteratornow supports non-Grain iterators that support checkpointing.Introduces API for device prefetch -
grain.experimental.device_put()for easy CPU and device prefetching.Introduces API for autotuning – given the user provided RAM restrictions and specific
IterDataset, finds number of processes formp_prefetchand buffer size forPrefetchDatasetIterator.Allow passing
reader_optionstoArrayRecordDataSourcefor configuring array record file reading.Introduces
grain.experimental.batch_and_padfor padding a partial batch to avoid dropping batch remainder data.Grain interleave optimization - allow creating more threads to parallelly keep starting iterators and prefetching elements.
Allow for alternative slicing of the data for
MultiprocessPrefetchIterDataset. New slicing allows each worker process to read unique file shards and thus improving performance.
Breaking changes:
Upgrades
array_recordandprotobuf.
Deprecations:
Bug fixes
Grain 0.2.11 (July 2, 2025)#
New features:
Automatic publishing releases to PyPI via GitHub actions.
Nightly builds.
Introduced changelog.