R/datasets.R
create_tabular_dataset_from_parquet_files.Rd
Create an unregistered, in-memory Dataset from parquet files.
create_tabular_dataset_from_parquet_files( path, validate = TRUE, include_path = FALSE, set_column_types = NULL, partition_format = NULL )
path | A data path in a registered datastore or a local path. |
---|---|
validate | Boolean to validate if data can be loaded from the returned dataset. Defaults to True. Validation requires that the data source is accessible from the current compute. |
include_path | Whether to include a column containing the path of the file from which the data was read. This is useful when you are reading multiple files, and want to know which file a particular record originated from, or to keep useful information in file path. |
set_column_types | A named list to set column data type, where key is column name and value is data type. |
partition_format | Specify the partition format in path and create string columns from format 'x' and datetime column from format 'x:yyyy/MM/dd/HH/mm/ss', where 'yyyy', 'MM', 'dd', 'HH', 'mm' and 'ss' are used to extrat year, month, day, hour, minute and second for the datetime type. The format should start from the postition of first partition key until the end of file path. For example, given a file path '../USA/2019/01/01/data.csv' and data is partitioned by country and time, we can define '/Country/PartitionDate:yyyy/MM/dd/data.csv' to create columns 'Country' of string type and 'PartitionDate' of datetime type. |
The Tabular Dataset object.
data_path