Documentation FeatureStore 0.15.1 documentation
Databricks Feature Store
Python API
Databricks FeatureStoreClient
Bases: object
Client for interacting with the Databricks Feature Store.
Note
Using Databricks FeatureStoreClient for feature tables in Unity Catalog requires version >= 0.13.5
Create and return a feature table with the given name and primary keys.
The returned feature table has the given name and primary keys. Uses the provided schema or the inferred schema of the provided df . If df
is provided, this data will be saved in a Delta table. Supported data types for features are: IntegerType, LongType, FloatType,
DoubleType , StringType , BooleanType , DateType , TimestampType , ShortType , ArrayType , MapType , and BinaryType , and
DecimalType .
class
databricks.feature_store.client.FeatureStoreClient(
feature_store_uri: Optional[str] = None
,
model_registry_uri: Optional[str] =
None
)
create_table(
name: str, primary_keys: Union[str, List[str]], df: Optional[pyspark.sql.dataframe.DataFrame] = None, *, timestamp_keys: Union[str,
List[str], None] = None, partition_columns: Union[str, List[str], None] = None, schema: Optional[pyspark.sql.types.StructType] = None, description:
Optional[str] = None, tags: Optional[Dict[str, str]] = None, **kwargs
) databricks.feature_store.entities.feature_table.FeatureTable
Databricks Feature Store 0.15
Parameters:
name – A feature table name. For workspace-local feature table, the format is <database_name>.<table_name> , for example
dev.user_features . For feature table in Unity Catalog, the format is <catalog_name>.<schema_name>.<table_name> , for
example ml.dev.user_features .
primary_keys – The feature tableʼs primary keys. If multiple columns are required, specify a list of column names, for example
['customer_id', 'region'] .
df – Data to insert into this feature table. The schema of df will be used as the feature table schema.
timestamp_keys
Columns containing the event time associated with feature value. Timestamp keys should be part of the primary keys. Combined, the
timestamp keys and other primary keys of the feature table uniquely identify the feature value for an entity at a point in time.
Note
Experimental: This argument may change or be removed in a future release without warning.
partition_columns
Columns used to partition the feature table. If a list is provided, column ordering in the list will be used for partitioning.
Note
When choosing partition columns for your feature table, use columns that do not have a high cardinality. An ideal strategy would
be such that you expect data in each partition to be at least 1 GB. The most commonly used partition column is a date .
Additional info: Choosing the right partition columns for Delta tables
schema – Feature table schema. Either schema or df must be provided.
description – Description of the feature table.
tags
Tags to associate with the feature table.
Note
Available in version >= 0.4.1.
Other Parameters:
path (Optional[str]) – Path in a supported filesystem. Defaults to the database location.
Note
The path argument is not supported for tables in Unity Catalog.
Register an existing Delta table as a feature table with the given primary keys.
This API is not required if the table is already in Unity Catalog and has primary keys.
The registered feature table has the same name as the Delta table.
Note
Available in version >= 0.3.8.
register_table(
*, delta_table: str, primary_keys: Union[str, List[str]], timestamp_keys: Union[str, List[str], None] = None, description: Optional[str] =
None, tags: Optional[Dict[str, str]] = None
) databricks.feature_store.entities.feature_table.FeatureTable
Parameters:
delta_table – A Delta table name. The table must exist in the metastore. For workspace-local table, the format is
<database_name>.<table_name> , for example dev.user_features. For table in Unity Catalog, the format is
<catalog_name>.<schema_name>.<table_name> , for example ml.dev.user_features.
primary_keys – The Delta tableʼs primary keys. If multiple columns are required, specify a list of column names, for example
['customer_id', 'region'] .
timestamp_keys – Columns containing the event time associated with feature value. Timestamp keys should be part of the primary
keys. Combined, the timestamp keys and other primary keys of the feature table uniquely identify the feature value for an entity at a
point in time.
description – Description of the feature table.
tags
Tags to associate with the feature table.
Note
Available in version >= 0.4.1.
Returns:
A FeatureTable object.
Get a feature tableʼs metadata.
Parameters: name – A feature table name. For workspace-local feature table, the format is <database_name>.<table_name> , for example
dev.user_features . For feature table in Unity Catalog, the format is <catalog_name>.<schema_name>.<table_name> , for example
ml.dev.user_features .
Note
Experimental: This function may change or be removed in a future release without warning.
Delete the specified feature table. This API also drops the underlying Delta table.
Note
Available in version >= 0.4.1.
Parameters: name – The feature table name. For workspace-local feature table, the format is <database_name>.<table_name> , for example
dev.user_features . For feature table in Unity Catalog, the format is <catalog_name>.<schema_name>.<table_name> , for example
ml.dev.user_features .
Note
Deleting a feature table can lead to unexpected failures in upstream producers and downstream consumers (models, endpoints, and
scheduled jobs). You must delete any existing published online stores separately.
Read the contents of a feature table.
Parameters: name – A feature table name of the form <database_name>.<table_name> , for example dev.user_features .
Returns:
The feature table contents, or an exception will be raised if this feature table does not exist.
Writes to a feature table.
If the input DataFrame is streaming, will create a write stream.
get_table(
name: str
) databricks.feature_store.entities.feature_table.FeatureTable
drop_table(
name: str
) None
read_table(
name: str
,
**kwargs
) pyspark.sql.dataframe.DataFrame
write_table(
name: str
,
df: pyspark.sql.dataframe.DataFrame
,
mode: str = 'merge'
,
checkpoint_location: Optional[str] = None
,
trigger: Dict[str
,
Any] =
{'processingTime': '5 seconds'}
) Optional[pyspark.sql.streaming.StreamingQuery]
Parameters:
name – A feature table name. Raises an exception if this feature table does not exist. For workspace-local feature table, the format is
<database_name>.<table_name> , for example dev.user_features. For feature table in Unity Catalog, the format is
<catalog_name>.<schema_name>.<table_name> , for example ml.dev.user_features.
df – Spark DataFrame with feature data. Raises an exception if the schema does not match that of the feature table.
mode
Two supported write modes:
"overwrite" updates the whole table.
"merge" will upsert the rows in df into the feature table. If df contains columns not present in the feature table, these columns
will be added as new features.
checkpoint_location – Sets the Structured Streaming checkpointLocation option. By setting a checkpoint_location , Spark
Structured Streaming will store progress information and intermediate state, enabling recovery aer failures. This parameter is only
supported when the argument df is a streaming DataFrame .
trigger – If df.isStreaming , trigger defines the timing of stream data processing, the dictionary will be unpacked and passed to
DataStreamWriter.trigger as arguments. For example, trigger={'once': True} will result in a call to
DataStreamWriter.trigger(once=True) .
Returns:
If df.isStreaming , returns a PySpark StreamingQuery . None otherwise.
Note
Experimental: This function may change or be removed in a future release without warning.
Add data sources to the feature table.
Note
Adding data sources is NOT supported for feature tables in Unity Catalog.
Parameters:
feature_table_name – The feature table name.
source_names – Data source names. For multiple sources, specify a list. If a data source name already exists, it is ignored.
source_type
One of the following:
"table" : Table in format <database_name>.<table_name> and is stored in the metastore (eg Hive).
"path" : Path, eg in the Databricks File System (DBFS).
"custom" : Manually added data source, neither a table nor a path.
Note
Experimental: This function may change or be removed in a future release without warning.
Delete data sources from the feature table.
Note
Data sources of all types (table, path, custom) that match the source names will be deleted. Deleting data sources is NOT supported for
feature tables in Unity Catalog.
Parameters:
feature_table_name – The feature table name.
source_names – Data source names. For multiple sources, specify a list. If a data source name does not exist, it is ignored.
add_data_sources(
*, feature_table_name: str, source_names: Union[str, List[str]], source_type: str = 'custom'
) None
delete_data_sources(
*, feature_table_name: str, source_names: Union[str, List[str]]
) None
publish_table(
name: str, online_store: databricks.feature_store.online_store_spec.online_store_spec.OnlineStoreSpec, *, filter_condition:
Optional[str] = None, mode: str = 'merge', streaming: bool = False, checkpoint_location: Optional[str] = None, trigger: Dict[str, Any] = {'processingTime': '5
minutes'}, features: Union[str, List[str], None] = None
) Optional[pyspark.sql.streaming.StreamingQuery]
Publish a feature table to an online store.
Parameters:
name – Name of the feature table.
online_store – Specification of the online store.
filter_condition – A SQL expression using feature table columns that filters feature rows prior to publishing to the online store. For
example, "dt > '2020-09-10'" . This is analogous to running df.filter or a WHERE condition in SQL on a feature table prior to
publishing.
mode
Specifies the behavior when data already exists in this feature table in the online store. If "overwrite" mode is used, existing data is
replaced by the new data. If "merge" mode is used, the new data will be merged in, under these conditions:
If a key exists in the online table but not the oline table, the row in the online table is unmodified.
If a key exists in the oline table but not the online table, the oline table row is inserted into the online table.
If a key exists in both the oline and the online tables, the online table row will be updated.
streaming – If True , streams data to the online store.
checkpoint_location – Sets the Structured Streaming checkpointLocation option. By setting a checkpoint_location , Spark
Structured Streaming will store progress information and intermediate state, enabling recovery aer failures. This parameter is only
supported when streaming=True .
trigger – If streaming=True , trigger defines the timing of stream data processing. The dictionary will be unpacked and passed to
DataStreamWriter.trigger as arguments. For example, trigger={'once': True} will result in a call to
DataStreamWriter.trigger(once=True) .
features
Specifies the feature column(s) to be published to the online store. The selected features must be a superset of existing online store
features. Primary key columns and timestamp key columns will always be published.
Note
This parameter is only supported when mode="merge" . When features is not set, the whole feature table will be published.
Returns:
If streaming=True , returns a PySpark StreamingQuery , None otherwise.
Drop a table in an online store.
This API first attempts to make a call to the online store provider to drop the table. If successful, it then deletes the online store from the
feature catalog.
Parameters:
name – Name of feature table associated with online store table to drop.
online_store – Specification of the online store.
Note
Available in version >= 0.12.0
Note
Deleting an online published table can lead to unexpected failures in downstream dependencies. Ensure that the online table being
dropped is no longer used for Model Serving feature lookup or any other use cases.
Create a TrainingSet .
drop_online_table(
name: str
,
online_store: databricks.feature_store.online_store_spec.online_store_spec.OnlineStoreSpec
) None
create_training_set(
df: pyspark.sql.dataframe.DataFrame, feature_lookups:
List[Union[databricks.feature_store.entities.feature_lookup.FeatureLookup, databricks.feature_store.entities.feature_function.FeatureFunction]], label:
Union[str, List[str], None], exclude_columns: Optional[List[str]] = None, **kwargs
) databricks.feature_store.training_set.TrainingSet
Parameters:
df – The DataFrame used to join features into.
feature_lookups
List of features to use in the TrainingSet . FeatureLookups are joined into the DataFrame , and FeatureFunctions are
computed on-demand.
Note
FeatureFunction is available in version >= 0.14.1
label – Names of column(s) in DataFrame that contain training set labels. To create a training set without a label field, i.e. for
unsupervised training set, specify label = None.
exclude_columns – Names of the columns to drop from the TrainingSet DataFrame .
Returns:
A TrainingSet object.
Log an MLflow model packaged with feature lookup information.
Note
The DataFrame returned by TrainingSet.load_df() must be used to train the model. If it has been modified (for example data
normalization, add a column, and similar), these modifications will not be applied at inference time, leading to training-serving skew.
Parameters:
model – Model to be saved. This model must be capable of being saved by flavor.save_model . See the MLflow Model API.
artifact_path – Run-relative artifact path.
flavor – MLflow module to use to log the model. flavor should have type ModuleType . The module must have a method
save_model , and must support the python_function flavor. For example, mlflow.sklearn , mlflow.xgboost , and similar.
training_set – The TrainingSet used to train this model.
registered_model_name
Note
Experimental: This argument may change or be removed in a future release without warning.
If given, create a model version under registered_model_name , also creating a registered model if one with the given name does
not exist.
await_registration_for – Number of seconds to wait for the model version to finish being created and is in READY status. By default,
the function waits for five minutes. Specify 0 or None to skip waiting.
infer_input_example
Note
Experimental: This argument may change or be removed in a future release without warning.
Automatically log an input example along with the model, using supplied training data. Defaults to False .
Returns:
None
Evaluate the model on the provided DataFrame .
Additional features required for model evaluation will be automatically retrieved from Feature Store.
The model must have been logged with FeatureStoreClient.log_model() , which packages the model with feature metadata. Unless
present in df , these features will be looked up from Feature Store and joined with df prior to scoring the model.
If a feature is included in df , the provided feature values will be used rather than those stored in Feature Store.
For example, if a model is trained on two features account_creation_date and num_lifetime_purchases , as in:
log_model(
model: Any
,
artifact_path: str
,
*
,
flavor: module
,
training_set: Optional[databricks.feature_store.training_set.TrainingSet] = None
,
registered_model_name: Optional[str] = None
,
await_registration_for: int = 300
,
infer_input_example: bool = False
,
**kwargs
)
score_batch(
model_uri: str
,
df: pyspark.sql.dataframe.DataFrame
,
result_type: str = 'double'
) pyspark.sql.dataframe.DataFrame
feature_lookups = [
FeatureLookup(
table_name = 'trust_and_safety.customer_features',
feature_name = 'account_creation_date',
lookup_key = 'customer_id',
),
FeatureLookup(
table_name = 'trust_and_safety.customer_features',
feature_name = 'num_lifetime_purchases',
lookup_key = 'customer_id'
),
]
with mlflow.start_run():
training_set = fs.create_training_set(
df,
feature_lookups = feature_lookups,
label = 'is_banned',
exclude_columns = ['customer_id']
)
...
fs.log_model(
model,
"model",
flavor=mlflow.sklearn,
training_set=training_set,
registered_model_name="example_model"
)
Then at inference time, the caller of FeatureStoreClient.score_batch() must pass a DataFrame that includes customer_id , the
lookup_key specified in the FeatureLookups of the training_set . If the DataFrame contains a column account_creation_date ,
the values of this column will be used in lieu of those in Feature Store . As in:
# batch_df has columns ['customer_id', 'account_creation_date']
predictions = fs.score_batch(
'models:/example_model/1',
batch_df
)
Parameters:
model_uri
The location, in URI format, of the MLflow model logged using FeatureStoreClient.log_model() . One of:
runs:/<mlflow_run_id>/run-relative/path/to/model
models:/<model_name>/<model_version>
models:/<model_name>/<stage>
For more information about URI schemes, see Referencing Artifacts.
df
The DataFrame to score the model on. Feature Store features will be joined with df prior to scoring the model. df must:
1. Contain columns for lookup keys required to join feature data from Feature Store, as specified in the feature_spec.yaml
artifact.
2. Contain columns for all source keys required to score the model, as specified in the feature_spec.yaml artifact.
3. Not contain a column prediction , which is reserved for the modelʼs predictions. df may contain additional columns.
Streaming DataFrames are not supported.
result_type – The return type of the model. See mlflow.pyfunc.spark_udf() result_type.
Returns:
A DataFrame containing:
1. All columns of df .
2. All feature values retrieved from Feature Store.
3. A column prediction containing the output of the model.
Create or update a tag associated with the feature table. If the tag with the corresponding key already exists, its value will be overwritten with
the new value.
Note
Available in version >= 0.4.1.
Parameters:
table_name – the feature table name
key – tag key
value – tag value
Delete the tag associated with the feature table. Deleting a non-existent tag will emit a warning.
Note
Available in version >= 0.4.1.
Parameters:
table_name – the feature table name.
key – the tag key to delete.
Feature Lookup
Bases: databricks.feature_store.entities._feature_store_object._FeatureStoreObject
Value class used to specify a feature to use in a TrainingSet .
set_feature_table_tag(
*
,
table_name: str
,
key: str
,
value: str
) None
delete_feature_table_tag(
*
,
table_name: str
,
key: str
) None
class
databricks.feature_store.entities.feature_lookup.FeatureLookup(
table_name: str, lookup_key: Union[str, List[str]], *,
feature_names: Union[str, List[str], None] = None, rename_outputs: Optional[Dict[str, str]] = None, timestamp_lookup_key: Union[str, List[str], None] = None,
lookback_window: Optional[datetime.timedelta] = None, **kwargs
)
Parameters:
table_name – Feature table name.
lookup_key – Key to use when joining this feature table with the DataFrame passed to
FeatureStoreClient.create_training_set() . The lookup_key must be the columns in the DataFrame passed to
FeatureStoreClient.create_training_set() . The type and order of lookup_key columns in that DataFrame must match the
primary key of the feature table referenced in this FeatureLookup .
feature_names – A single feature name, a list of feature names, or None to lookup all features (excluding primary keys) in the feature table
at the time that the training set is created. If your model requires primary keys as features, you can declare them as independent
FeatureLookups.
rename_outputs – If provided, renames features in the TrainingSet returned by of FeatureStoreClient.create_training_set .
timestamp_lookup_key
Key to use when performing point-in-time lookup on this feature table with the DataFrame passed to
FeatureStoreClient.create_training_set() . The timestamp_lookup_key must be the columns in the DataFrame passed to
FeatureStoreClient.create_training_set() . The type of timestamp_lookup_key columns in that DataFrame must match the
type of the timestamp key of the feature table referenced in this FeatureLookup .
Note
Experimental: This argument may change or be removed in a future release without warning.
lookback_window
The lookback window to use when performing point-in-time lookup on the feature table with the dataframe passed to
FeatureStoreClient.create_training_set() . Feature Store will retrieve the latest feature value prior to the timestamp specified in
the dataframeʼs timestamp_lookup_key and within the lookback_window , or null if no such feature value exists. When set to 0, only
exact matches from the feature table are returned.
Note
Available in version >= 0.13.0
feature_name – Feature name. Deprecated as of version 0.3.4. Use feature_names .
output_name – If provided, rename this feature in the output of FeatureStoreClient.create_training_set . Deprecated as of
version 0.3.4 . Use rename_outputs .
Initialize a FeatureLookup object. See class documentation.
The table name to use in this FeatureLookup.
The lookup key(s) to use in this FeatureLookup.
The feature name to use in this FeatureLookup. Deprecated as of version 0.3.4. Use feature_names .
The output name to use in this FeatureLookup. Deprecated as of version 0.3.4. Use feature_names .
A lookback window applied only for point-in-time lookups.
Feature Function
Bases: databricks.feature_store.entities._feature_store_object._FeatureStoreObject
Value class used to specify a Python user-defined function (UDF) in Unity Catalog to use in a TrainingSet .
__init__(
table_name: str, lookup_key: Union[str, List[str]], *, feature_names: Union[str, List[str], None] = None, rename_outputs: Optional[Dict[str, str]]
= None, timestamp_lookup_key: Union[str, List[str], None] = None, lookback_window: Optional[datetime.timedelta] = None, **kwargs
)
table_name
lookup_key
feature_name
output_name
lookback_window
class
databricks.feature_store.entities.feature_function.FeatureFunction(
*
,
udf_name: str
,
input_bindings: Optional[Dict[str
,
str]] =
None
,
output_name: Optional[str] = None
)
Note
FeatureFunction is available in version >= 0.14.1
Parameters:
udf_name – The Python UDF name.
input_bindings – Mapping of UDF inputs to features in the TrainingSet .
output_name – Output feature name of this FeatureFunction. If empty, defaults to the fully qualified udf_name when evaluated.
Initialize a FeatureFunction object. See class documentation.
The name of the Python UDF called by this FeatureFunction.
The input to use for each argument of the Python UDF.
For example:
{"x": "feature1", "y": "input1"}
The output name to use for the results of this FeatureFunction. If empty, defaults to the fully qualified udf_name when evaluated.
Training Set
Bases: object
Class that defines TrainingSet objects.
Note
The TrainingSet constructor should not be called directly. Instead, call FeatureStoreClient.create_training_set.
Load a DataFrame.
Return a DataFrame for training.
The returned DataFrame has columns specified in the feature_spec and labels parameters provided in
FeatureStoreClient.create_training_set .
Returns:
A DataFrame for training
Feature Table
Classes
Value class describing one feature table.
This will typically not be instantiated directly, instead the FeatureStoreClient.create_table will create FeatureTable objects.
__init__(
*
,
udf_name: str
,
input_bindings: Optional[Dict[str
,
str]] = None
,
output_name: Optional[str] = None
)
udf_name
input_bindings
output_name
class
databricks.feature_store.training_set.TrainingSet(
feature_spec: databricks.feature_store.entities.feature_spec.FeatureSpec, df:
pyspark.sql.dataframe.DataFrame, labels: List[str], feature_table_metadata_map: Dict[str, databricks.feature_store.entities.feature_table.FeatureTable],
feature_table_data_map: Dict[str, pyspark.sql.dataframe.DataFrame], uc_function_infos: Dict[str,
databricks.feature_store.information_schema_spark_client.FunctionInfo]
)
load_df() pyspark.sql.dataframe.DataFrame
class
databricks.feature_store.entities.feature_table.FeatureTable(
name
,
table_id
,
description
,
primary_keys
,
partition_columns
,
features
,
creation_timestamp=None
,
online_stores=None
,
notebook_producers=None
,
job_producers=None
,
table_data_sources=None
,
path_data_sources=None
,
custom_data_sources=None
,
timestamp_keys=None
,
tags=None
)
Online Store Spec
Bases: databricks.feature_store.online_store_spec.online_store_spec.OnlineStoreSpec
Class that defines and creates AmazonRdsMySqlSpec objects.
This OnlineStoreSpec implementation is intended for publishing features to Amazon RDS MySQL and Aurora (MySQL-compatible edition).
See OnlineStoreSpec documentation for more usage information, including parameter descriptions.
Parameters:
hostname – Hostname to access online store.
port – Port number to access online store.
user – Username that has access to the online store. Deprecated as of version 0.6.0. Use write_secret_prefix instead.
password – Password to access the online store. Deprecated as of version 0.6.0. Use write_secret_prefix instead.
database_name – Database name.
table_name – Table name.
driver_name – Name of custom JDBC driver to access the online store.
read_secret_prefix – Prefix for read secret.
write_secret_prefix – Prefix for write secret.
Hostname to access the online store.
Port number to access the online store.
Database name.
Define the cloud propert for the data store.
Define the data store type property.
Publish Auth type.
Bases: databricks.feature_store.online_store_spec.online_store_spec.OnlineStoreSpec
Define the AzureMySqlSpec class.
This OnlineStoreSpec implementation is intended for publishing features to Azure Database for MySQL.
See OnlineStoreSpec documentation for more usage information, including parameter descriptions.
class
databricks.feature_store.online_store_spec.AmazonRdsMySqlSpec(
hostname: str
,
port: int
,
user: Optional[str] = None
,
password:
Optional[str] = None
,
database_name: Optional[str] = None
,
table_name: Optional[str] = None
,
driver_name: Optional[str] = None
,
read_secret_prefix:
Optional[str] = None
,
write_secret_prefix: Optional[str] = None
)
hostname
port
database_name
cloud
store_type
auth_type()
class
databricks.feature_store.online_store_spec.AzureMySqlSpec(
hostname: str
,
port: int
,
user: Optional[str] = None
,
password:
Optional[str] = None
,
database_name: Optional[str] = None
,
table_name: Optional[str] = None
,
driver_name: Optional[str] = None
,
read_secret_prefix:
Optional[str] = None
,
write_secret_prefix: Optional[str] = None
)
Parameters:
hostname – Hostname to access online store.
port – Port number to access online store.
user – Username that has access to the online store. Deprecated as of version 0.6.0. Use write_secret_prefix instead.
password – Password to access the online store. Deprecated as of version 0.6.0. Use write_secret_prefix instead.
database_name – Database name.
table_name – Table name.
driver_name – Name of custom JDBC driver to access the online store.
read_secret_prefix – Prefix for read secret.
write_secret_prefix – Prefix for write secret.
Hostname to access the online store.
Port number to access the online store.
Database name.
Define the cloud the fature store runs.
Define the data store type.
Publish Auth type.
Bases: databricks.feature_store.online_store_spec.online_store_spec.OnlineStoreSpec
This OnlineStoreSpec implementation is intended for publishing features to Azure SQL Database (SQL Server).
The spec supports SQL Server 2019 and newer.
See OnlineStoreSpec documentation for more usage information, including parameter descriptions.
Parameters:
hostname – Hostname to access online store.
port – Port number to access online store.
user – Username that has access to the online store. Deprecated as of version 0.6.0. Use write_secret_prefix instead.
password – Password to access the online store. Deprecated as of version 0.6.0. Use write_secret_prefix instead.
database_name – Database name.
table_name – Table name.
driver_name – Name of custom JDBC driver to access the online store.
read_secret_prefix – Prefix for read secret.
write_secret_prefix – Prefix for write secret.
Hostname to access the online store.
Port number to access the online store.
Database name.
hostname
port
database_name
cloud
store_type
auth_type()
class
databricks.feature_store.online_store_spec.AzureSqlServerSpec(
hostname: str
,
port: int
,
user: Optional[str] = None
,
password:
Optional[str] = None
,
database_name: Optional[str] = None
,
table_name: Optional[str] = None
,
driver_name: Optional[str] = None
,
read_secret_prefix:
Optional[str] = None
,
write_secret_prefix: Optional[str] = None
)
hostname
port
database_name
Define the cloud the fature store runs.
Define the data store type.
Publish Auth type.
Bases: databricks.feature_store.online_store_spec.online_store_spec.OnlineStoreSpec
This OnlineStoreSpec implementation is intended for publishing features to Amazon DynamoDB.
If table_name is not provided, FeatureStoreClient.publish_table will use the oline storeʼs database and table name combined as the
online table name.
To use a dierent table name in the online store, provide a value for the table_name argument.
The expected read or write secrets for DynamoDB for a given {prefix} string are ${prefix}-access-key-id ,
${prefix}-secret-access-key , and ${prefix}-session-token .
If none of the access_key_id, secret_access_key, and write_secret_prefix are passed, the instance profile attached to the cluster will be used to
write to DynamoDB.
Note
AmazonDynamoDBSpec is available in version >= 0.3.8.
Instance profile based writes are available in version >= 0.4.1.
Parameters:
region – Region to access online store.
access_key_id – Access key ID that has access to the online store. Deprecated as of version 0.6.0. Use write_secret_prefix instead.
secret_access_key – Secret access key to access the online store. Deprecated as of version 0.6.0. Use write_secret_prefix instead.
session_token – Session token to access the online store. Deprecated as of version 0.6.0. Use write_secret_prefix instead.
table_name – Table name.
read_secret_prefix – Prefix for read secret.
write_secret_prefix – Prefix for write secret.
ttl – The time to live for data published to the online store. This attribute is only applicable when publishing time series feature tables. If
the time to live is specified for a time series table, FeatureStoreClient.publish_table() will publish a window of data instead of
the latest snapshot.
Warning
databricks.feature_store.online_store_spec.amazon_dynamodb_online_store_spec.AmazonDynamoDBSpec.access_key_id
is deprecated since v0.6.0. This method will be removed in a future release. Use write_secret_prefix instead.
Access key ID that has access to the online store. Property will be empty if write_secret_prefix or the instance profile attached to the
cluster are intended to be used.
Warning
databricks.feature_store.online_store_spec.amazon_dynamodb_online_store_spec.AmazonDynamoDBSpec.secret_access_key
is deprecated since v0.6.0. This method will be removed in a future release. Use write_secret_prefix instead.
cloud
store_type
auth_type()
class
databricks.feature_store.online_store_spec.AmazonDynamoDBSpec(
*, region: Optional[str], access_key_id: Optional[str] = None,
secret_access_key: Optional[str] = None, session_token: Optional[str] = None, table_name: Optional[str] = None, read_secret_prefix: Optional[str] = None,
write_secret_prefix: Optional[str] = None, ttl: Optional[datetime.timedelta] = None, endpoint_url: Optional[str] = None
)
access_key_id
secret_access_key
Secret access key to access the online store. Property will be empty if write_secret_prefix or the instance profile attached to the cluster
are intended to be used.
Warning
databricks.feature_store.online_store_spec.amazon_dynamodb_online_store_spec.AmazonDynamoDBSpec.session_token
is deprecated since v0.6.0. This method will be removed in a future release. Use write_secret_prefix instead.
Session token to access the online store. Property will be empty if write_secret_prefix or the instance profile attached to the cluster are
intended to be used.
Endpoint url of DynamoDB online store, mainly used for testing with LocalStack
Define the cloud property for the data store.
Define the data store type.
Region to access the online store.
Time to live attribute for the online store.
Publish Auth type.
Bases: databricks.feature_store.online_store_spec.online_store_spec.OnlineStoreSpec
This OnlineStoreSpec implementation is intended for publishing features to Azure Cosmos DB.
If database_name and container_name are not provided, FeatureStoreClient.publish_table will use the oline storeʼs database and
table name as the Cosmos DB database and container name.
The expected read or write secret for Cosmos DB for a given {prefix} string is ${prefix}-authorization-key .
The authorization key can be either the Cosmos DB account primary or secondary key.
Note
Available in version >= 0.5.0.
Parameters:
account_uri – URI of the Cosmos DB account.
database_name – Database name.
container_name – Container name.
read_secret_prefix – Prefix for read secret.
write_secret_prefix – Prefix for write secret.
Account URI of the online store.
session_token
endpoint_url
cloud
store_type
region
ttl
auth_type()
class
databricks.feature_store.online_store_spec.AzureCosmosDBSpec(
*
,
account_uri: str
,
database_name: Optional[str] = None
,
container_name: Optional[str] = None
,
read_secret_prefix: Optional[str] = None
,
write_secret_prefix: str
)
account_uri
Database name.
Container name.
Define the cloud property for the data store.
Define the data store type.
Publish Auth type.
Bases: abc.ABC
Parent class for all types of OnlineStoreSpec objects.
Abstract base class for classes that specify the online store to publish to.
If database_name and table_name are not provided, FeatureStoreClient.publish_table will use the oline storeʼs database and table
names.
To use a dierent database and table name in the online store, provide values for both database_name and table_name arguments.
The JDBC driver can be customized with the optional driver_name argument. Otherwise, a default is used.
Strings in the primary key should not exceed 100 characters.
The online database should already exist.
Note
It is strongly suggested (but not required), to provide read-only database credentials via the read_secret_prefix in order to grant the least
amount of database access privileges to the served model. When providing a read_secret_prefix , the secrets must exist in the scope
name using the expected format, otherwise publish_table will return an error.
database_name
container_name
cloud
store_type
auth_type()
class
databricks.feature_store.online_store_spec.OnlineStoreSpec(
_type
,
hostname: [<class 'str'>
,
None] = None
,
port: [<class 'int'>
,
None] = None
,
user: Optional[str] = None
,
password: Optional[str] = None
,
database_name: Optional[str] = None
,
table_name: Optional[str] = None
,
driver_name: Optional[str] = None
,
read_secret_prefix: Optional[str] = None
,
write_secret_prefix: Optional[str] = None
,
_internal_properties:
Optional[Dict[str
,
str]] = None
)
Parameters:
hostname – Hostname to access online store. The database hostname cannot be changed. Subsequent publish calls to the same online
store must provide the same hostname.
port – Port number to access online store. The database port cannot be changed. Subsequent publish calls to the same online store must
provide the same port.
user – Username that has write access to the online store. Deprecated as of version 0.6.0. Use write_secret_prefix instead.
password – Password to access the online store. Deprecated as of version 0.6.0. Use write_secret_prefix instead.
database_name – Database name.
table_name – Table name.
driver_name – Name of custom JDBC driver to access the online store.
read_secret_prefix
The secret scope name and secret key name prefix where read-only online store credentials are stored. These credentials will be used
during online feature serving to connect to the online store from the served model. The format of this parameter should be
${scope-name}/${prefix} , which is the name of the secret scope, followed by a / , followed by the secret key name prefix. The scope
passed in must contain the following keys and corresponding values:
${prefix}-user where ${prefix} is the value passed into this function. For example if this function is called with
datascience/staging , the datascience secret scope should contain the secret named staging-user, which points to a secret
value with the database username for the online store.
${prefix}-password where ${prefix} is the value passed into this function. For example if this function is called with
datascience/staging , the datascience secret scope should contain the secret named staging-password, which points to a
secret value with the database password for the online store.
Once the read_secret_prefix is set for an online store, it cannot be changed.
write_secret_prefix
The secret scope name and secret key name prefix where read-write online store credentials are stored. These credentials will be used to
connect to the online store to publish features. If user and password are passed, this field must be None , or an exception will be raised.
The format of this parameter should be ${scope-name}/${prefix} , which is the name of the secret scope, followed by a / , followed
by the secret key name prefix. The scope passed in must contain the following keys and corresponding values:
${prefix}-user where ${prefix} is the value passed into this function. For example if this function is called with
datascience/staging , the datascience secret scope should contain the secret named staging-user, which points to a secret
value with the database username for the online store.
${prefix}-password where ${prefix} is the value passed into this function. For example if this function is called with
datascience/staging , the datascience secret scope should contain the secret named staging-password, which points to a
secret value with the database password for the online store.
Type of the online store.
Table name.
Warning
databricks.feature_store.online_store_spec.online_store_spec.OnlineStoreSpec.user is deprecated since v0.6.0. This
method will be removed in a future release. Use write_secret_prefix instead.
Username that has access to the online store.
Property will be empty if write_secret_prefix argument was used.
Warning
databricks.feature_store.online_store_spec.online_store_spec.OnlineStoreSpec.password is deprecated since v0.6.0.
This method will be removed in a future release. Use write_secret_prefix instead.
Password to access the online store.
Property will be empty if write_secret_prefix argument was used.
type
table_name
user
password
Name of the custom JDBC driver to access the online store.
Prefix for read access to online store.
Name of the secret scope and prefix that contains the username and password to access the online store with read-only credentials.
See the read_secret_prefix parameter description for details.
Secret prefix that contains online store login info.
Name of the secret scope and prefix that contains the username and password to access the online store with read/write credentials. See the
write_secret_prefix parameter description for details.
Cloud provider where this online store is located.
Store type.
Publish Auth type.
driver
read_secret_prefix
write_secret_prefix
cloud
store_type
auth_type()