API Reference¶
BlobIO¶
- class azstoragetorch.io.BlobIO(blob_url: str, mode: Literal['rb', 'wb'], *, credential: AzureSasCredential | TokenCredential | None | Literal[False] = None, **_internal_only_kwargs)¶
Bases:
IOBaseFile-like object for reading and writing blobs in Azure Blob Storage.
Use this class directly for PyTorch checkpointing by passing it directly to
torch.save()ortorch.load()The sample below shows how to use
BlobIOwithtorch.save()import torch import torchvision.models # Install separately: ``pip install torchvision`` from azstoragetorch.io import BlobIO # Update URL with your own Azure Storage account and container name CONTAINER_URL = ( "https://<my-storage-account-name>.blob.core.windows.net/<my-container-name>" ) # Model to save. Replace with your own model. model = torchvision.models.resnet18(weights="DEFAULT") # Save trained model to Azure Blob Storage. This saves the model weights # to a blob named "model_weights.pth" in the container specified by CONTAINER_URL. with BlobIO(f"{CONTAINER_URL}/model_weights.pth", "wb") as f: torch.save(model.state_dict(), f)
The sample below shows how to use
BlobIOwithtorch.load()import torch import torchvision.models # Install separately: ``pip install torchvision`` from azstoragetorch.io import BlobIO # Update URL with your own Azure Storage account and container name CONTAINER_URL = ( "https://<my-storage-account-name>.blob.core.windows.net/<my-container-name>" ) # Model to load weights for. Replace with your own model. model = torchvision.models.resnet18() # Load trained model from Azure Blob Storage. This loads the model weights # from the blob named "model_weights.pth" in the container specified by CONTAINER_URL. with BlobIO(f"{CONTAINER_URL}/model_weights.pth", "rb") as f: model.load_state_dict(torch.load(f))
- Parameters:
blob_url (str) -- The full endpoint URL to the blob. The URL respects SAS tokens, snapshots, and version IDs in its query string.
mode (Literal['rb', 'wb']) --
The mode in which to open the blob. Supported modes are:
rb- Opens blob for readingwb- Opens blob for writing
credential (AzureSasCredential | TokenCredential | None | Literal[False]) -- The credential to use for authentication. If not specified,
azure.identity.DefaultAzureCredentialwill be used. When set toFalse, anonymous requests will be made. If theblob_urlcontains a SAS token, this parameter is ignored.
- close() None¶
Close the file-like object.
In write mode, this will
flush()and commit the blob.- Raises:
FatalBlobIOWriteError -- if a fatal error occurs when writing to blob. If raised, no data written, nor uploaded, using this
BlobIOinstance will be committed to the blob. It is recommended to create a newBlobIOinstance and retry all writes when attempting retries.
- property closed: bool¶
Whether the file-like object is closed.
Is
Trueif the file-like is closed,Falseotherwise.
- flush() None¶
Flush all written data to the blob.
When called, any unstaged data will be uploaded and method will block until all uploads complete. In read mode, this method has no effect.
- Raises:
FatalBlobIOWriteError -- if a fatal error occurs when writing to blob. If raised, no data written, nor uploaded, using this
BlobIOinstance will be committed to the blob. It is recommended to create a newBlobIOinstance and retry all writes when attempting retries.
- readable() bool¶
Return whether file-like object is readable.
- Returns:
Trueif opened in read mode,Falseotherwise.- Return type:
- readline(size: int | None = -1, /) bytes¶
Read and return a line from the file-like object.
The line terminator is always
b'\n'.
- seek(offset: int, whence: int = 0, /) int¶
Change the file-like position to a given byte offset.
- Parameters:
offset (int) -- The offset to seek to
whence (int) --
The reference point for the offset. Accepted values are:
os.SEEK_SET- The start of the file-like object (the default)os.SEEK_CUR- The current position in the file-like objectos.SEEK_END- The end of the file-like object
- Returns:
The new absolute position in the file-like object.
- Return type:
- seekable() bool¶
Return whether file-like object supports random access.
- Returns:
Trueif can seek,Falseotherwise. Seeking is only supported in read mode.- Return type:
- tell() int¶
Return the current position in the file-like object.
- Returns:
The current position in the file-like object.
- Return type:
- writable() bool¶
Return whether file-like object is writeable.
- Returns:
Trueif opened in write mode,Falseotherwise.- Return type:
- write(b: bytes | bytearray | memoryview, /) int¶
Writes a bytes-like object to the blob
Data written may not be immediately uploaded. Instead, data may be uploaded via threads after
write()has returned or may be uploaded as part of subsequent calls toBlobIOmethods. This means ifwrite()has returned without an error, it does not mean the data was successfully uploaded to the blob. Calls toflush()orclose()will upload all pending data, block until all data is uploaded, and propogate any errors.- Parameters:
b (bytes | bytearray | memoryview) -- The bytes-like object to write to the blob.
- Returns:
The number of bytes written
- Raises:
FatalBlobIOWriteError -- if a fatal error occurs when writing to blob. If raised, no data written, nor uploaded, using this
BlobIOinstance will be committed to the blob. It is recommended to create a newBlobIOinstance and retry all writes when attempting retries.- Return type:
Datasets¶
- class azstoragetorch.datasets.BlobDataset(blobs: Iterable[Blob], transform: Callable[[Blob], _TransformOutputType_co] | None = None)¶
Bases:
Dataset[_TransformOutputType_co]Map-style dataset for blobs in Azure Blob Storage.
Data samples returned from dataset map directly one-to-one to blobs in Azure Blob Storage. Use
from_blob_urls()orfrom_container_url()to create an instance of this dataset. For example:from azstoragetorch.datasets import BlobDataset dataset = BlobDataset.from_container_url( "https://<storage-account-name>.blob.core.windows.net/<container-name>" ) print(dataset[0]) # Print first blob in the dataset
Instantiating dataset class directly using
__init__()is not supported.Usage with PyTorch DataLoader
The dataset can be provided directly to a PyTorch
DataLoader:import torch.utils.data loader = torch.utils.data.DataLoader(dataset)
Dataset output
The default output format of the dataset is a dictionary with the keys:
url: The full endpoint URL of the blob.data: The content of the blob asbytes.
For example:
{ "url": "https://<account-name>.blob.core.windows.net/<container-name>/<blob-name>", "data": b"<blob-content>" }
To override the output format, provide a
transformcallable to eitherfrom_blob_urls()orfrom_container_url()when creating the dataset.- classmethod from_blob_urls(blob_urls: str | Iterable[str], *, credential: AzureSasCredential | TokenCredential | None | Literal[False] = None, transform: Callable[[Blob], _TransformOutputType_co] | None = None) Self¶
Instantiate dataset from provided blob URLs.
Sample usage:
container_url = "https://<storage-account-name>.blob.core.windows.net/<container-name>" dataset = BlobDataset.from_blob_urls([ f"{container_url}/<blob-name-1>", f"{container_url}/<blob-name-2>", f"{container_url}/<blob-name-3>", ])
- Parameters:
blob_urls (str | Iterable[str]) -- The full endpoint URLs to the blobs to be used for dataset. Can be a single URL or an iterable of URLs. URLs respect SAS tokens, snapshots, and version IDs in their query strings.
credential (AzureSasCredential | TokenCredential | None | Literal[False]) -- The credential to use for authentication. If not specified,
azure.identity.DefaultAzureCredentialwill be used. When set toFalse, anonymous requests will be made. If a URL contains a SAS token, this parameter is ignored for that URL.transform (Callable[[Blob], _TransformOutputType_co] | None) -- A callable that accepts a
Blobobject representing a blob in the dataset and returns a transformed output to be used as output from the dataset. SeeBlobclass for more information on writing atransformcallable to override the default dataset output format.
- Returns:
Dataset formed from the provided blob URLs.
- Return type:
- classmethod from_container_url(container_url: str, *, prefix: str | None = None, credential: AzureSasCredential | TokenCredential | None | Literal[False] = None, transform: Callable[[Blob], _TransformOutputType_co] | None = None) Self¶
Instantiate dataset by listing blobs from provided container URL.
Sample usage:
dataset = BlobDataset.from_container_url( "https://<storage-account-name>.blob.core.windows.net/<container-name>", )
- Parameters:
container_url (str) -- The full endpoint URL to the container to be used for dataset. The URL respects SAS tokens in its query string.
prefix (str | None) -- The prefix to filter blobs by. Only blobs whose names begin with
prefixwill be included in the dataset. If not specified, all blobs in the container will be included in the dataset.credential (AzureSasCredential | TokenCredential | None | Literal[False]) -- The credential to use for authentication. If not specified,
azure.identity.DefaultAzureCredentialwill be used. When set toFalse, anonymous requests will be made. If a URL contains a SAS token, this parameter is ignored for that URL.transform (Callable[[Blob], _TransformOutputType_co] | None) -- A callable that accepts a
Blobobject representing a blob in the dataset and returns a transformed output to be used as output from the dataset. SeeBlobclass for more information on writing atransformcallable to override the default dataset output format.
- Returns:
Dataset formed from the blobs in the provided container URL.
- Return type:
- class azstoragetorch.datasets.IterableBlobDataset(blobs: Iterable[Blob], transform: Callable[[Blob], _TransformOutputType_co] | None = None)¶
Bases:
IterableDataset[_TransformOutputType_co]Iterable-style dataset for blobs in Azure Blob Storage.
Data samples returned from dataset map directly one-to-one to blobs in Azure Blob Storage. Use
from_blob_urls()orfrom_container_url()to create an instance of this dataset. For example:from azstoragetorch.datasets import IterableBlobDataset dataset = IterableBlobDataset.from_container_url( "https://<storage-account-name>.blob.core.windows.net/<container-name>" ) print(next(iter(dataset))) # Print first blob in the dataset
Instantiating dataset class directly using
__init__()is not supported.Usage with PyTorch DataLoader
The dataset can be provided directly to a PyTorch
DataLoader:import torch.utils.data loader = torch.utils.data.DataLoader(dataset)
When setting
num_workersfor theDataLoader, the dataset automatically shards data samples returned across workers to avoid theDataLoaderreturning duplicate data samples from its workers.Dataset output
The default output format of the dataset is a dictionary with the keys:
url: The full endpoint URL of the blob.data: The content of the blob asbytes.
For example:
{ "url": "https://<account-name>.blob.core.windows.net/<container-name>/<blob-name>", "data": b"<blob-content>" }
To override the output format, provide a
transformcallable to eitherfrom_blob_urls()orfrom_container_url()when creating the dataset.- classmethod from_blob_urls(blob_urls: str | Iterable[str], *, credential: AzureSasCredential | TokenCredential | None | Literal[False] = None, transform: Callable[[Blob], _TransformOutputType_co] | None = None) Self¶
Instantiate dataset from provided blob URLs.
Sample usage:
container_url = "https://<storage-account-name>.blob.core.windows.net/<container-name>" dataset = IterableBlobDataset.from_blob_urls([ f"{container_url}/<blob-name-1>", f"{container_url}/<blob-name-2>", f"{container_url}/<blob-name-3>", ])
- Parameters:
blob_urls (str | Iterable[str]) -- The full endpoint URLs to the blobs to be used for dataset. Can be a single URL or an iterable of URLs. URLs respect SAS tokens, snapshots, and version IDs in their query strings.
credential (AzureSasCredential | TokenCredential | None | Literal[False]) -- The credential to use for authentication. If not specified,
azure.identity.DefaultAzureCredentialwill be used. When set toFalse, anonymous requests will be made. If a URL contains a SAS token, this parameter is ignored for that URL.transform (Callable[[Blob], _TransformOutputType_co] | None) -- A callable that accepts a
Blobobject representing a blob in the dataset and returns a transformed output to be used as output from the dataset. SeeBlobclass for more information on writing atransformcallable to override the default dataset output format.
- Returns:
Dataset formed from the provided blob URLs.
- Return type:
- classmethod from_container_url(container_url: str, *, prefix: str | None = None, credential: AzureSasCredential | TokenCredential | None | Literal[False] = None, transform: Callable[[Blob], _TransformOutputType_co] | None = None) Self¶
Instantiate dataset by listing blobs from provided container URL.
Sample usage:
dataset = IterableBlobDataset.from_container_url( "https://<storage-account-name>.blob.core.windows.net/<container-name>", )
- Parameters:
container_url (str) -- The full endpoint URL to the container to be used for dataset. The URL respects SAS tokens in its query string.
prefix (str | None) -- The prefix to filter blobs by. Only blobs whose names begin with
prefixwill be included in the dataset. If not specified, all blobs in the container will be included in the dataset.credential (AzureSasCredential | TokenCredential | None | Literal[False]) -- The credential to use for authentication. If not specified,
azure.identity.DefaultAzureCredentialwill be used. When set toFalse, anonymous requests will be made. If a URL contains a SAS token, this parameter is ignored for that URL.transform (Callable[[Blob], _TransformOutputType_co] | None) -- A callable that accepts a
Blobobject representing a blob in the dataset and returns a transformed output to be used as output from the dataset. SeeBlobclass for more information on writing atransformcallable to override the default dataset output format.
- Returns:
Dataset formed from the blobs in the provided container URL.
- Return type:
- class azstoragetorch.datasets.Blob(blob_client: AzStorageTorchBlobClient)¶
Object representing a single blob in a dataset.
Datasets instantiate
Blobobjects and pass them directly to a dataset'stransformcallable. Within thetransformcallable, use properties and methods to access a blob's properties and content. For example:from azstoragetorch.datasets import Blob, BlobDataset def to_bytes(blob: Blob) -> bytes: with blob.reader() as f: return f.read() dataset = BlobDataset.from_blob_urls( "https://<storage-account-name>.blob.core.windows.net/<container-name>/<blob-name>", transform=to_bytes ) print(type(dataset[0])) # Type should be: <class 'bytes'>
Instantiating class directly using
__init__()is not supported.
Exceptions¶
- exception azstoragetorch.exceptions.AZStorageTorchError¶
Bases:
ExceptionBase class for exceptions raised by
azstoragetorch.
- exception azstoragetorch.exceptions.ClientRequestIdMismatchError(request_client_id: str, response_client_id: str, service_request_id: str)¶
Bases:
AZStorageTorchErrorRaised when a client request ID in a response does not match the ID in it's originating request.
If receiving this error as part of using both an azstoragetorch dataset and a PyTorch DataLoader, it may be because the dataset is being accessed in both the main process and a DataLoader's worker process. This can cause unintentional sharing of resources. To fix this error, consider not accessing the dataset's samples in the main process or not using workers with the DataLoader.
- exception azstoragetorch.exceptions.FatalBlobIOWriteError(underlying_exception: BaseException)¶
Bases:
AZStorageTorchErrorRaised when a fatal error occurs during
BlobIOwrite operations.When this exception is raised, it indicates no more writing can be performed on the
BlobIOobject and no blocks staged as part of thisBlobIOwill be committed. It is recommended to create a newBlobIOobject and retry all writes when attempting retries.