Dataset#

Dataset([data_vars, coords, attrs])

A multi-dimensional, in memory, array database.

Attributes#

Dataset.dims

Mapping from dimension names to lengths.

Dataset.sizes

Mapping from dimension names to lengths.

Dataset.dtypes

Mapping from data variable names to dtypes.

Dataset.data_vars

Dictionary of DataArray objects corresponding to data variables

Dataset.coords

Mapping of DataArray objects corresponding to coordinate variables.

Dataset.attrs

Dictionary of global attributes on this dataset

Dataset.encoding

Dictionary of global encoding attributes on this dataset

Dataset.indexes

Mapping of pandas.Index objects used for label based indexing.

Dataset.xindexes

Mapping of Index objects used for label based indexing.

Dataset.chunks

Mapping from dimension names to block lengths for this dataset's data.

Dataset.chunksizes

Mapping from dimension names to block lengths for this dataset's data.

Dataset.nbytes

Total bytes consumed by the data arrays of all variables in this dataset.

Dictionary interface#

Datasets implement the mapping interface with keys given by variable names and values given by DataArray objects.

Dataset.__getitem__(key)

Access variables or coordinates of this dataset as a DataArray or a subset of variables or a indexed dataset.

Dataset.__setitem__(key, value)

Add an array to this dataset.

Dataset.__delitem__(key)

Remove a variable from this dataset.

Dataset.update(other)

Update this dataset's variables with those from another dataset.

Dataset.get(k[,d])

Dataset.items()

Dataset.keys()

Dataset.values()

Dataset contents#

Dataset.copy([deep, data])

Returns a copy of this dataset.

Dataset.assign([variables])

Assign new data variables to a Dataset, returning a new object with all the original variables in addition to the new ones.

Dataset.assign_coords([coords])

Assign new coordinates to this object.

Dataset.assign_attrs(*args, **kwargs)

Assign new attrs to this object.

Dataset.pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs)

Dataset.merge(other[, overwrite_vars, ...])

Merge the arrays of two datasets into a single dataset.

Dataset.rename([name_dict])

Returns a new object with renamed variables, coordinates and dimensions.

Dataset.rename_vars([name_dict])

Returns a new object with renamed variables including coordinates

Dataset.rename_dims([dims_dict])

Returns a new object with renamed dimensions only.

Dataset.swap_dims([dims_dict])

Returns a new object with swapped dimensions.

Dataset.expand_dims([dim, axis, ...])

Return a new object with an additional axis (or axes) inserted at the corresponding position in the array shape.

Dataset.drop_vars(names, *[, errors])

Drop variables from this dataset.

Dataset.drop_indexes(coord_names, *[, errors])

Drop the indexes assigned to the given coordinates.

Dataset.drop_duplicates(dim, *[, keep])

Returns a new Dataset with duplicate dimension values removed.

Dataset.drop_dims(drop_dims, *[, errors])

Drop dimensions and associated variables from this dataset.

Dataset.drop_encoding()

Return a new Dataset without encoding on the dataset or any of its variables/coords.

Dataset.drop_attrs(*[, deep])

Removes all attributes from the Dataset and its variables.

Dataset.set_coords(names)

Given names of one or more variables, set them as coordinates

Dataset.reset_coords([names, drop])

Given names of coordinates, reset them to become variables

Dataset.convert_calendar(calendar[, dim, ...])

Convert the Dataset to another calendar.

Dataset.interp_calendar(target[, dim])

Interpolates the Dataset to another calendar based on decimal year measure.

Dataset.get_index(key)

Get an index for a dimension, with fall-back to a default RangeIndex

Comparisons#

Dataset.equals(other)

Two Datasets are equal if they have matching variables and coordinates, all of which are equal.

Dataset.identical(other)

Like equals, but also checks all dataset attributes and the attributes on all variables and coordinates.

Dataset.broadcast_equals(other)

Two Datasets are broadcast equal if they are equal after broadcasting all variables against each other.

Indexing#

Dataset.loc

Attribute for location based indexing.

Dataset.isel([indexers, drop, missing_dims])

Returns a new dataset with each array indexed along the specified dimension(s).

Dataset.sel([indexers, method, tolerance, drop])

Returns a new dataset with each array indexed by tick labels along the specified dimension(s).

Dataset.drop_sel([labels, errors])

Drop index labels from this dataset.

Dataset.drop_isel([indexers])

Drop index positions from this Dataset.

Dataset.head([indexers])

Returns a new dataset with the first n values of each array for the specified dimension(s).

Dataset.tail([indexers])

Returns a new dataset with the last n values of each array for the specified dimension(s).

Dataset.thin([indexers])

Returns a new dataset with each array indexed along every n-th value for the specified dimension(s)

Dataset.squeeze([dim, drop, axis])

Return a new object with squeezed data.

Dataset.interp([coords, method, ...])

Interpolate a Dataset onto new coordinates.

Dataset.interp_like(other[, method, ...])

Interpolate this object onto the coordinates of another object.

Dataset.reindex([indexers, method, ...])

Conform this object onto a new set of indexes, filling in missing values with fill_value.

Dataset.reindex_like(other[, method, ...])

Conform this object onto the indexes of another object, for indexes which the objects share.

Dataset.set_index([indexes, append])

Set Dataset (multi-)indexes using one or more existing coordinates or variables.

Dataset.reset_index(dims_or_levels, *[, drop])

Reset the specified index(es) or multi-index level(s).

Dataset.set_xindex(coord_names[, index_cls])

Set a new, Xarray-compatible index from one or more existing coordinate(s).

Dataset.reorder_levels([dim_order])

Rearrange index levels using input order.

Dataset.query([queries, parser, engine, ...])

Return a new dataset with each array indexed along the specified dimension(s), where the indexers are given as strings containing Python expressions to be evaluated against the data variables in the dataset.

Missing value handling#

Dataset.isnull([keep_attrs])

Test each value in the array for whether it is a missing value.

Dataset.notnull([keep_attrs])

Test each value in the array for whether it is not a missing value.

Dataset.combine_first(other)

Combine two Datasets, default to data_vars of self.

Dataset.count([dim, keep_attrs])

Reduce this Dataset's data by applying count along some dimension(s).

Dataset.dropna(dim, *[, how, thresh, subset])

Returns a new dataset with dropped labels for missing values along the provided dimension.

Dataset.fillna(value)

Fill missing values in this object.

Dataset.ffill(dim[, limit])

Fill NaN values by propagating values forward

Dataset.bfill(dim[, limit])

Fill NaN values by propagating values backward

Dataset.interpolate_na([dim, method, limit, ...])

Fill in NaNs by interpolating according to different methods.

Dataset.where(cond[, other, drop])

Filter elements from this object according to a condition.

Dataset.isin(test_elements)

Tests each value in the array for whether it is in test elements.

Computation#

Dataset.map(func[, keep_attrs, args])

Apply a function to each data variable in this dataset

Dataset.reduce(func[, dim, keep_attrs, ...])

Reduce this dataset by applying func along some dimension(s).

Dataset.groupby([group, squeeze, ...])

Returns a DatasetGroupBy object for performing grouped operations.

Dataset.groupby_bins(group, bins[, right, ...])

Returns a DatasetGroupBy object for performing grouped operations.

Dataset.rolling([dim, min_periods, center])

Rolling window object for Datasets.

Dataset.rolling_exp([window, window_type])

Exponentially-weighted moving window.

Dataset.cumulative(dim[, min_periods])

Accumulating object for Datasets

Dataset.weighted(weights)

Weighted Dataset operations.

Dataset.coarsen([dim, boundary, side, ...])

Coarsen object for Datasets.

Dataset.resample([indexer, skipna, closed, ...])

Returns a Resample object for performing resampling operations.

Dataset.diff(dim[, n, label])

Calculate the n-th order discrete difference along given axis.

Dataset.quantile(q[, dim, method, ...])

Compute the qth quantile of the data along the specified dimension.

Dataset.differentiate(coord[, edge_order, ...])

Differentiate with the second order accurate central differences.

Dataset.integrate(coord[, datetime_unit])

Integrate along the given coordinate using the trapezoidal rule.

Dataset.map_blocks(func[, args, kwargs, ...])

Apply a function to each block of this Dataset.

Dataset.polyfit(dim, deg[, skipna, rcond, ...])

Least squares polynomial fit.

Dataset.curvefit(coords, func[, ...])

Curve fitting optimization for arbitrary functions.

Dataset.eval(statement, *[, parser])

Calculate an expression supplied as a string in the context of the dataset.

Aggregation#

Dataset.all([dim, keep_attrs])

Reduce this Dataset's data by applying all along some dimension(s).

Dataset.any([dim, keep_attrs])

Reduce this Dataset's data by applying any along some dimension(s).

Dataset.argmax([dim])

Indices of the maxima of the member variables.

Dataset.argmin([dim])

Indices of the minima of the member variables.

Dataset.count([dim, keep_attrs])

Reduce this Dataset's data by applying count along some dimension(s).

Dataset.idxmax([dim, skipna, fill_value, ...])

Return the coordinate label of the maximum value along a dimension.

Dataset.idxmin([dim, skipna, fill_value, ...])

Return the coordinate label of the minimum value along a dimension.

Dataset.max([dim, skipna, keep_attrs])

Reduce this Dataset's data by applying max along some dimension(s).

Dataset.min([dim, skipna, keep_attrs])

Reduce this Dataset's data by applying min along some dimension(s).

Dataset.mean([dim, skipna, keep_attrs])

Reduce this Dataset's data by applying mean along some dimension(s).

Dataset.median([dim, skipna, keep_attrs])

Reduce this Dataset's data by applying median along some dimension(s).

Dataset.prod([dim, skipna, min_count, ...])

Reduce this Dataset's data by applying prod along some dimension(s).

Dataset.sum([dim, skipna, min_count, keep_attrs])

Reduce this Dataset's data by applying sum along some dimension(s).

Dataset.std([dim, skipna, ddof, keep_attrs])

Reduce this Dataset's data by applying std along some dimension(s).

Dataset.var([dim, skipna, ddof, keep_attrs])

Reduce this Dataset's data by applying var along some dimension(s).

Dataset.cumsum([dim, skipna, keep_attrs])

Reduce this Dataset's data by applying cumsum along some dimension(s).

Dataset.cumprod([dim, skipna, keep_attrs])

Reduce this Dataset's data by applying cumprod along some dimension(s).

ndarray methods#

Dataset.argsort([axis, kind, order])

Returns the indices that would sort this array.

Dataset.astype(dtype, *[, order, casting, ...])

Copy of the xarray object, with data cast to a specified type.

Dataset.clip([min, max, keep_attrs])

Return an array whose values are limited to [min, max].

Dataset.conj()

Complex-conjugate all elements.

Dataset.conjugate(*args, **kwargs)

a.conj()

Dataset.imag

The imaginary part of each data variable.

Dataset.round(*args, **kwargs)

Dataset.real

The real part of each data variable.

Dataset.rank(dim, *[, pct, keep_attrs])

Ranks the data.

Reshaping and reorganizing#

Dataset.transpose(*dim[, missing_dims])

Return a new Dataset object with all array dimensions transposed.

Dataset.stack([dim, create_index, index_cls])

Stack any number of existing dimensions into a single new dimension.

Dataset.unstack([dim, fill_value, sparse])

Unstack existing dimensions corresponding to MultiIndexes into multiple new dimensions.

Dataset.to_stacked_array(new_dim, sample_dims)

Combine variables of differing dimensionality into a DataArray without broadcasting.

Dataset.shift([shifts, fill_value])

Shift this dataset by an offset along one or more dimensions.

Dataset.roll([shifts, roll_coords])

Roll this dataset by an offset along one or more dimensions.

Dataset.pad([pad_width, mode, stat_length, ...])

Pad this dataset along one or more dimensions.

Dataset.sortby(variables[, ascending])

Sort object by labels or values (along an axis).

Dataset.broadcast_like(other[, exclude])

Broadcast this DataArray against another Dataset or DataArray.