lamindb.core.Schema

class lamindb.core.Schema(features: Iterable[Record], dtype: str | None = None, name: str | None = None)

Bases: Record, CanCurate, TracksRun

Feature sets (dataset schemas).

Stores references to dataset schemas: these are the sets of columns in a dataset that correspond to Feature, Gene, Protein or other entities.

Why does LaminDB model feature sets, not just features?
  1. Performance: Imagine you measure the same panel of 20k transcripts in 1M samples. By modeling the panel as a feature set, you can link all your artifacts against one feature set and only need to store 1M instead of 1M x 20k = 20B links.

  2. Interpretation: Model protein panels, gene panels, etc.

  3. Data integration: Feature sets provide the information that determines whether two datasets can be meaningfully concatenated.

These reasons do not hold for label sets. Hence, LaminDB does not model label sets.

Parameters:
  • featuresIterable[Record] An iterable of Feature records to hash, e.g., [Feature(...), Feature(...)]. Is turned into a set upon instantiation. If you’d like to pass values, use from_values() or from_df().

  • dtypestr | None = None The simple type. Defaults to None for sets of Feature records. Otherwise defaults to "num" (e.g., for sets of Gene).

  • namestr | None = None A name.

Note

A feature set can be identified by the hash its feature uids. It’s stored in the .hash field.

A slot provides a string key to access feature sets. It’s typically the accessor within the registered data object, here pd.DataFrame.columns.

See also

from_values()

Create from values.

from_df()

Create from dataframe columns.

Examples

Create a feature set / schema from df with types:

>>> df = pd.DataFrame({"feat1": [1, 2], "feat2": [3.1, 4.2], "feat3": ["cond1", "cond2"]})
>>> feature_set = ln.FeatureSet.from_df(df)

Create a feature set / schema from features:

>>> features = [ln.Feature(name=feat, dtype="float").save() for feat in ["feat1", "feat2"]]
>>> feature_set = ln.FeatureSet(features)

Create a feature set / schema from feature values:

>>> import bionty as bt
>>> feature_set = ln.FeatureSet.from_values(adata.var["ensemble_id"], Gene.ensembl_gene_id, organism="mouse").save()

Link a feature set to an artifact:

>>> artifact.features.add_feature_set(feature_set, slot="var")

Attributes

feature_sets

Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.

In the example:

class Pizza(Model):
    toppings = ManyToManyField(Topping, related_name='pizzas')

Pizza.toppings and Topping.pizzas are ManyToManyDescriptor instances.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

property members: QuerySet

A queryset for the individual records of the set.

property registry: str

Simple fields

uid: str

A universal id (hash of the set of feature values).

name: str | None

A name.

n

Number of features in the set.

dtype: str | None

Data type, e.g., “num”, “float”, “int”. Is None for Feature.

For Feature, types are expected to be heterogeneous and defined on a per-feature level.

itype: str | None

A registry that stores feature identifiers used in this schema, e.g., 'Feature' or 'bionty.Gene'.

Depending on the registry, .members stores, e.g., Feature or bionty.Gene records.

Changed in version 1.0.0: Was called itype before.

is_type: bool

Distinguish types from instances of the type.

otype: str | None

Default Python object type, e.g., DataFrame, AnnData.

hash: str | None

A hash of the set of feature identifiers.

For a composite schema, the hash of hashes.

minimal_set: bool

Whether the schema contains a minimal set of linked features (default True).

If False, no features are linked to this schema.

If True, features are linked and considered as a minimally required set in validation.

ordered_set: bool

Whether the linked features are ordered (default False).

maximal_set: bool

If False, additional features are allowed (default False).

If True, the the minimal set is a maximal set and no additional features are allowed.

slot: str | None

The slot in which the schema is stored in the composite schema.

created_at: datetime

Time of creation of record.

Relational fields

created_by: User

Creator of record.

run: Run | None

Last run that created or updated the record.

space: Space

The space in which the record lives.

type: Feature | None

Type of feature set (e.g., ‘ExpressionPanel’, ‘ProteinPanel’, ‘Multimodal’, ‘Metadata’, ‘Embedding’).

Allows to group feature sets by type, e.g., all meassurements evaluating gene expression vs. protein expression vs. multi modal.

composite: Schema | None

The composite schema that contains this schema as a component.

The composite schema composes multiple simpler schemas into one object.

For example, an AnnData composes multiple schemas: var[DataFrameT], obs[DataFrame], obsm[Array], uns[dict], etc.

validated_by: Schema | None

The schema that validated this schema during curation.

When performing validation, the schema that enforced validation is often less concrete than what is validated.

For instance, the set of measured features might be a superset of the minimally required set of features.

Often, the curating schema does not specficy any concrete features at all

params: Param

The params contained in the schema.

features: Feature

The features contained in the schema.

records: Feature

Records of this type.

components

Accessor to the related objects manager on the reverse side of a many-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Parent.children is a ReverseManyToOneDescriptor instance.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

validated_schemas

Accessor to the related objects manager on the reverse side of a many-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Parent.children is a ReverseManyToOneDescriptor instance.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

artifacts: Artifact

The artifacts that observe this schema.

projects

Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.

In the example:

class Pizza(Model):
    toppings = ManyToManyField(Topping, related_name='pizzas')

Pizza.toppings and Topping.pizzas are ManyToManyDescriptor instances.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

Class methods

classmethod df(include=None, features=False, limit=100)

Convert to pd.DataFrame.

By default, shows all direct fields, except updated_at.

Use arguments include or feature to include other data.

Parameters:
  • include (str | list[str] | None, default: None) – Related fields to include as columns. Takes strings of form "ulabels__name", "cell_types__name", etc. or a list of such strings.

  • features (bool | list[str], default: False) – If True, map all features of the Feature registry onto the resulting DataFrame. Only available for Artifact.

  • limit (int, default: 100) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.

Return type:

DataFrame

Examples

Include the name of the creator in the DataFrame:

>>> ln.ULabel.df(include="created_by__name"])

Include display of features for Artifact:

>>> df = ln.Artifact.df(features=True)
>>> ln.view(df)  # visualize with type annotations

Only include select features:

>>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"])
classmethod filter(*queries, **expressions)

Query records.

Parameters:
  • queries – One or multiple Q objects.

  • expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

Returns:

A QuerySet.

See also

Examples

>>> ln.ULabel(name="my label").save()
>>> ln.ULabel.filter(name__startswith="my").df()
classmethod from_df(df, field=FieldAttr(Feature.name), name=None, mute=False, organism=None, source=None)

Create feature set for validated features.

Return type:

Schema | None

classmethod from_values(values, field=FieldAttr(Feature.name), type=None, name=None, mute=False, organism=None, source=None, raise_validation_error=True)

Create feature set for validated features.

Parameters:
  • values (list[str] | Series | array) – A list of values, like feature names or ids.

  • field (DeferredAttribute, default: FieldAttr(Feature.name)) – The field of a reference registry to map values.

  • type (str | None, default: None) – The simple type. Defaults to None if reference registry is Feature, defaults to "float" otherwise.

  • name (str | None, default: None) – A name.

  • organism (Record | str | None, default: None) – An organism to resolve gene mapping.

  • source (Record | None, default: None) – A public ontology to resolve feature identifier mapping.

  • raise_validation_error (bool, default: True) – Whether to raise a validation error if some values are not valid.

Raises:

ValidationError – If some values are not valid.

Return type:

Schema

Examples

>>> features = [ln.Feature(name=feat, dtype="str").save() for feat in ["feat11", "feat21"]]
>>> schema = ln.Schema.from_values(features)
>>> genes = ["ENSG00000139618", "ENSG00000198786"]
>>> schema = ln.Schema.from_values(features, bt.Gene.ensembl_gene_id, "float")
classmethod get(idlike=None, **expressions)

Get a single record.

Parameters:
  • idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.

  • expressions – Fields and values passed as Django query expressions.

Return type:

Record

Returns:

A record.

Raises:

lamindb.core.exceptions.DoesNotExist – In case no matching record is found.

See also

Examples

>>> ulabel = ln.ULabel.get("FvtpPJLJ")
>>> ulabel = ln.ULabel.get(name="my-label")
classmethod inspect(values, field=None, *, mute=False, organism=None, source=None)

Inspect if values are mappable to a field.

Being mappable means that an exact match exists.

Parameters:
  • values (list[str] | Series | array) – Values that will be checked against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute (bool, default: False) – Whether to mute logging.

  • organism (str | Record | None, default: None) – An Organism name or record.

  • source (Record | None, default: None) – A bionty.Source record that specifies the version to inspect against.

Return type:

InspectResult

See also

validate()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol"))
>>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
>>> result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol)
>>> result.validated
['A1CF', 'A1BG']
>>> result.non_validated
['FANCD1', 'FANCD20']
classmethod lookup(field=None, return_field=None)

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> bt.Gene.from_source(symbol="ADGB-DT").save()
>>> lookup = bt.Gene.lookup()
>>> lookup.adgb_dt
>>> lookup_dict = lookup.dict()
>>> lookup_dict['ADGB-DT']
>>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
>>> genes.ensg00000002745
>>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
classmethod search(string, *, field=None, limit=20, case_sensitive=False)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.

  • limit (int | None, default: 20) – Maximum amount of top results to return.

  • case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name")
>>> ln.save(ulabels)
>>> ln.ULabel.search("ULabel2")
classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, public_aware=True, keep='first', synonyms_field='synonyms', organism=None, source=None)

Maps input synonyms to standardized names.

Parameters:
  • values (list[str] | Series | array) – Identifiers that will be standardized.

  • field (str | DeferredAttribute | None, default: None) – The field representing the standardized names.

  • return_field (str, default: None) – The field to return. Defaults to field.

  • return_mapper (bool, default: False) – If True, returns {input_value: standardized_name}.

  • case_sensitive (bool, default: False) – Whether the mapping is case sensitive.

  • mute (bool, default: False) – Whether to mute logging.

  • public_aware (bool, default: True) – Whether to standardize from Bionty reference. Defaults to True for Bionty registries.

  • keep (Literal['first', 'last', False], default: 'first') –

    When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated:
    • "first": returns the first mapped standardized name

    • "last": returns the last mapped standardized name

    • False: returns all mapped standardized name.

    When keep is False, the returned list of standardized names will contain nested lists in case of duplicates.

    When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.

  • synonyms_field (str, default: 'synonyms') – A field containing the concatenated synonyms.

  • organism (str | Record | None, default: None) – An Organism name or record.

  • source (Record | None, default: None) – A bionty.Source record that specifies the version to validate against.

Return type:

list[str] | dict[str, str]

Returns:

If return_mapper is False – a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.

See also

add_synonym()

Add synonyms.

remove_synonym()

Remove synonyms.

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol"))
>>> gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
>>> standardized_names = bt.Gene.standardize(gene_synonyms)
>>> standardized_names
['A1CF', 'A1BG', 'BRCA2', 'FANCD20']
classmethod using(instance)

Use a non-default LaminDB instance.

Parameters:

instance (str | None) – An instance identifier of form “account_handle/instance_name”.

Return type:

QuerySet

Examples

>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name")
            uid    score
name
ULabel7  g7Hk9b2v  100.0
ULabel5  t4Jm6s0q   75.0
ULabel6  r2Xw8p1z   75.0
classmethod validate(values, field=None, *, mute=False, organism=None, source=None)

Validate values against existing values of a string field.

Note this is strict validation, only asserts exact matches.

Parameters:
  • values (list[str] | Series | array) – Values that will be validated against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute (bool, default: False) – Whether to mute logging.

  • organism (str | Record | None, default: None) – An Organism name or record.

  • source (Record | None, default: None) – A bionty.Source record that specifies the version to validate against.

Return type:

ndarray

Returns:

A vector of booleans indicating if an element is validated.

See also

inspect()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol"))
>>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
>>> bt.Gene.validate(gene_symbols, field=bt.Gene.symbol)
array([ True,  True, False, False])

Methods

add_synonym(synonym, force=False, save=None)

Add synonyms to a record.

Parameters:
  • synonym (str | list[str] | Series | array) – The synonyms to add to the record.

  • force (bool, default: False) – Whether to add synonyms even if they are already synonyms of other records.

  • save (bool | None, default: None) – Whether to save the record to the database.

See also

remove_synonym()

Remove synonyms.

Examples

>>> import bionty as bt
>>> bt.CellType.from_source(name="T cell").save()
>>> lookup = bt.CellType.lookup()
>>> record = lookup.t_cell
>>> record.synonyms
'T-cell|T lymphocyte|T-lymphocyte'
>>> record.add_synonym("T cells")
>>> record.synonyms
'T cells|T-cell|T-lymphocyte|T lymphocyte'
async adelete(using=None, keep_parents=False)
async arefresh_from_db(using=None, fields=None, from_queryset=None)
async asave(*args, force_insert=False, force_update=False, using=None, update_fields=None)
clean()

Hook for doing any extra model-wide validation after clean() has been called on every field by self.clean_fields. Any ValidationError raised by this method will not be associated with a particular field; it will have a special-case association with the field defined by NON_FIELD_ERRORS.

clean_fields(exclude=None)

Clean all fields and raise a ValidationError containing a dict of all validation errors if any occur.

date_error_message(lookup_type, field_name, unique_for)
delete()

Delete.

Return type:

None

get_constraints()
get_deferred_fields()

Return a set containing names of deferred fields for this instance.

prepare_database_save(field)
refresh_from_db(using=None, fields=None, from_queryset=None)

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

remove_synonym(synonym)

Remove synonyms from a record.

Parameters:

synonym (str | list[str] | Series | array) – The synonym values to remove.

See also

add_synonym()

Add synonyms

Examples

>>> import bionty as bt
>>> bt.CellType.from_source(name="T cell").save()
>>> lookup = bt.CellType.lookup()
>>> record = lookup.t_cell
>>> record.synonyms
'T-cell|T lymphocyte|T-lymphocyte'
>>> record.remove_synonym("T-cell")
'T lymphocyte|T-lymphocyte'
save(*args, **kwargs)

Save.

Return type:

Schema

save_base(raw=False, force_insert=False, force_update=False, using=None, update_fields=None)

Handle the parts of saving which should be done only once per save, yet need to be done in raw saves, too. This includes some sanity checks and signal sending.

The ‘raw’ argument is telling save_base not to save any parent models and not to do any changes to the values before save. This is used by fixture loading.

serializable_value(field_name)

Return the value of the field name for this instance. If the field is a foreign key, return the id value instead of the object. If there’s no Field object with this name on the model, return the model attribute’s value.

Used to serialize a field’s value (in the serializer, or form output, for example). Normally, you would just access the attribute directly and not use this method.

set_abbr(value)

Set value for abbr field and add to synonyms.

Parameters:

value (str) – A value for an abbreviation.

See also

add_synonym()

Examples

>>> import bionty as bt
>>> bt.ExperimentalFactor.from_source(name="single-cell RNA sequencing").save()
>>> scrna = bt.ExperimentalFactor.get(name="single-cell RNA sequencing")
>>> scrna.abbr
None
>>> scrna.synonyms
'single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing'
>>> scrna.set_abbr("scRNA")
>>> scrna.abbr
'scRNA'
>>> scrna.synonyms
'scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq'
>>> scrna.save()
unique_error_message(model_class, unique_check)
validate_constraints(exclude=None)
validate_unique(exclude=None)

Check unique constraints on the model and raise ValidationError if any failed.