Transfer data¶
This guide shows how to transfer data from a source database instance into the current default database instance.
# !pip install 'lamindb[jupyter,aws,bionty]'
!lamin init --storage ./test-transfer --modules bionty
Show code cell output
! using anonymous user (to identify, call: lamin login)
→ initialized lamindb: anonymous/test-transfer
import lamindb as ln
ln.track("ITeOtm7bhtdq0000")
Show code cell output
→ connected lamindb: anonymous/test-transfer
→ created Transform('ITeOtm7bhtdq0000'), started new Run('8pzmfVcs...') at 2025-01-17 14:20:21 UTC
→ notebook imports: lamindb==1.0rc1
Query all artifacts in the laminlabs/lamindata
instance and filter them to their latest versions.
# query all latest artifact versions
artifacts = ln.Artifact.using("laminlabs/lamindata").filter(is_latest=True)
# convert the QuerySet to a DataFrame and show the latest 5 versions
artifacts.df().head()
Show code cell output
! source modules has additional modules: {'wetlab'}
consider mounting these registry modules to transfer all metadata
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
1282 | WQtsc0CQZKB9GEst0000 | None | Example R cars dataset | .parquet | dataset | DataFrame | 2402 | eIk8NXNiwMoGmhhjrMILbg | NaN | NaN | md5 | True | False | 1 | 2 | None | None | True | 460.0 | 2025-01-15 14:22:51.192955+00:00 | 30 | None | 1 |
607 | sRapK07mMtToihzFeTaf | None | View Papalexi21 in Vitessce | .vitessce.json | None | None | 1527 | jfAtjNNzdvetUaEo5zhf0Q | NaN | NaN | md5 | True | False | 1 | 2 | None | None | True | 141.0 | 2024-04-30 12:51:16.348884+00:00 | 2 | None | 1 |
726 | HXJ4DDAw8012jVKwoxgd | None | View Kuppe2022 in Vitessce | .vitessce.json | None | None | 5258 | JsVK8X8EGRsyTEMnD3Z-6g | NaN | NaN | md5 | True | False | 1 | 2 | None | None | True | 198.0 | 2024-06-26 10:35:31.697669+00:00 | 2 | None | 1 |
895 | nbX7Pk0SAPHNlsQD0000 | devdata/params_2024-09-30_11-44-22.json | None | .json | None | None | 38084 | s6viX7LZ6KsjWcXigAn0eg | NaN | NaN | md5 | True | False | 1 | 2 | None | None | True | NaN | 2024-10-02 15:25:49.609268+00:00 | 9 | None | 1 |
815 | XmeH4JgiJFha7Nl90000 | schmidt22_perturbseq/schmidt22_perturbseq.h5ad | schmidt22 perturbseq counts | .h5ad | None | AnnData | 20659936 | MwfMo7FUjrdk5mzTHx9RMw | NaN | NaN | md5-n | False | False | 1 | 2 | None | None | True | 377.0 | 2024-06-18 09:26:45.885472+00:00 | 2 | None | 1 |
You can now further subset or search the QuerySet
. Here we query by whether the description contains “tabula sapiens”.
artifact = artifacts.filter(description__contains="Tabula Sapiens").first()
artifact.describe()
Show code cell output
! source modules has additional modules: {'wetlab'}
consider mounting these registry modules to transfer all metadata
Artifact .h5ad ├── General │ ├── .uid = 'dPraor9rU1EofcFb6Wph' │ ├── .key = 'tabula_sapiens_lung.h5ad' │ ├── .size = 3899435772 │ ├── .hash = '8mB1KK2wd51F6HQdvqipcQ' │ ├── .path = s3://lamindata/tabula_sapiens_lung.h5ad │ ├── .created_by = Koncopd (Sergei Rybakov) │ ├── .created_at = 2023-07-14 19:00:30 │ └── .transform = 'Ingest Tabula Sapiens Lung' └── Labels └── .tissues bionty.Tissue lung .cell_types bionty.CellType myofibroblast cell, B cell, capillary ae… .experimental_factors bionty.ExperimentalFactor anoxya, stroke .ulabels ULabel TSP1, TSP2, TSP14
By saving the artifact record that’s currently attached to the source database instance, you transfer it to the default database instance.
artifact.save()
Show code cell output
/home/runner/work/lamindb/lamindb/lamindb/_record.py:642: FutureWarning: `name` will be removed soon, please pass 'Transfer from `laminlabs/lamindata`' to `description` instead
transform = Transform(
→ mapped records: Tissue(uid='7Tt4iEKc'), CellType(uid='5tiBvp96'), CellType(uid='7Crr32HI'), CellType(uid='6dzoXJ3Y'), CellType(uid='01NqvhnI'), CellType(uid='5NceZTYm'), CellType(uid='4PSMdO3I'), CellType(uid='3JO0EdVd'), CellType(uid='6rfrjhvo'), CellType(uid='37mWPv6o'), CellType(uid='5Z76sCep'), CellType(uid='2OWUH6Z1'), CellType(uid='5TU8SFt5'), CellType(uid='ryEtgi1y'), CellType(uid='1lMgAPE8'), CellType(uid='7m6Ruz32'), CellType(uid='42qbvc90'), CellType(uid='puGNwNrs'), CellType(uid='1T8bGe2I'), CellType(uid='6IC9NGJE'), CellType(uid='6ujMwy7s'), CellType(uid='3eecYgWR'), CellType(uid='zQ4dyjEs'), CellType(uid='7mNqzyFE'), CellType(uid='5A9EFjNB'), CellType(uid='3lsrLTv6'), CellType(uid='1HYtHpIc'), CellType(uid='6UmKFrzn'), CellType(uid='7eZArDpo'), CellType(uid='2KCFdGIk'), CellType(uid='1V5wVqK5'), CellType(uid='5i19XYug'), CellType(uid='2nPA0h4F'), CellType(uid='5Xi2OLvZ'), CellType(uid='3kaL3W1c'), ExperimentalFactor(uid='5YDCOg0V'), ExperimentalFactor(uid='7R1OhRJ7')
→ transferred records: Artifact(uid='dPraor9rU1EofcFb6Wph'), Storage(uid='D9BilDV2'), CellType(uid='4mZaXZQg'), CellType(uid='5rVn0X39'), CellType(uid='EWy46Sey'), CellType(uid='4yqLzwwm'), ULabel(uid='vfLXaHgD'), ULabel(uid='gk6w8qC5'), ULabel(uid='tZCTk48f')
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, key='tabula_sapiens_lung.h5ad', description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', space_id=1, storage_id=2, run_id=2, created_by_id=1, created_at=2023-07-14 19:00:30 UTC)
How do I know if a record is saved in the default database instance or not?
Every record has an attribute ._state.db
which can take the following values:
None
: the record has not yet been saved to any database"default"
: the record is saved on the default database instance"account/name"
: the record is saved on a non-default database instance referenced byaccount/name
(e.g.,laminlabs/lamindata
)
The artifact record and all other feature & label records have been transferred to the current database.
artifact.describe()
Show code cell output
Artifact .h5ad ├── General │ ├── .uid = 'dPraor9rU1EofcFb6Wph' │ ├── .key = 'tabula_sapiens_lung.h5ad' │ ├── .size = 3899435772 │ ├── .hash = '8mB1KK2wd51F6HQdvqipcQ' │ ├── .path = s3://lamindata/tabula_sapiens_lung.h5ad │ ├── .created_by = anonymous │ ├── .created_at = 2023-07-14 19:00:30 │ └── .transform = 'Transfer from `laminlabs/lamindata`' └── Labels └── .tissues bionty.Tissue lung .cell_types bionty.CellType pulmonary alveolar type 1 cell, adventit… .experimental_factors bionty.ExperimentalFactor anoxya, stroke .ulabels ULabel TSP1, TSP2, TSP14
You see that the data itself remained in the original storage location, which has been added to the current instance’s storage location as a read-only location.
ln.Storage.df()
Show code cell output
uid | root | description | type | region | instance_uid | space_id | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
1 | JnmScCp9QIoG | /home/runner/work/lamindb/lamindb/docs/test-tr... | None | local | None | 1FHu5eE0uxm4 | 1 | NaN | 2025-01-17 14:20:13.937000+00:00 | 1 | None | 1 |
2 | D9BilDV2 | s3://lamindata | None | s3 | us-east-1 | 4XIuR0tvaiXM | 1 | 2.0 | 2023-04-22 05:50:06.537267+00:00 | 1 | None | 1 |
See the state of the database.
ln.view()
Show code cell output
****************
* module: core *
****************
Artifact
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
1 | dPraor9rU1EofcFb6Wph | tabula_sapiens_lung.h5ad | Part of Tabula Sapiens, a benchmark, first-dra... | .h5ad | None | None | 3899435772 | 8mB1KK2wd51F6HQdvqipcQ | None | None | sha1-fl | False | False | 1 | 2 | None | None | True | 2 | 2023-07-14 19:00:30.621330+00:00 | 1 | None | 1 |
Run
uid | name | started_at | finished_at | reference | reference_type | _is_consecutive | _status_code | space_id | transform_id | report_id | _logfile_id | environment_id | initiated_by_run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||
1 | 8pzmfVcsFRu5pDg8g5zd | None | 2025-01-17 14:20:21.124334+00:00 | None | None | None | None | 0 | 1 | 1 | None | None | None | NaN | 2025-01-17 14:20:21.124386+00:00 | 1 | None | 1 |
2 | 0NmOCMX8i2P1JpVPvVPT | None | 2025-01-17 14:20:25.505404+00:00 | None | None | None | None | 0 | 1 | 2 | None | None | None | 1.0 | 2025-01-17 14:20:25.505453+00:00 | 1 | None | 1 |
Storage
uid | root | description | type | region | instance_uid | space_id | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
1 | JnmScCp9QIoG | /home/runner/work/lamindb/lamindb/docs/test-tr... | None | local | None | 1FHu5eE0uxm4 | 1 | NaN | 2025-01-17 14:20:13.937000+00:00 | 1 | None | 1 |
2 | D9BilDV2 | s3://lamindata | None | s3 | us-east-1 | 4XIuR0tvaiXM | 1 | 2.0 | 2023-04-22 05:50:06.537267+00:00 | 1 | None | 1 |
Transform
uid | key | description | type | source_code | hash | reference | reference_type | space_id | _template_id | version | is_latest | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||
2 | 4XIuR0tvaiXM0000 | transfers/4XIuR0tvaiXM | Transfer from `laminlabs/lamindata` | function | None | None | None | None | 1 | None | None | True | 2025-01-17 14:20:25.495824+00:00 | 1 | None | 1 |
1 | ITeOtm7bhtdq0000 | transfer.ipynb | Transfer data | notebook | None | None | None | None | 1 | None | None | True | 2025-01-17 14:20:21.116237+00:00 | 1 | None | 1 |
ULabel
uid | name | is_type | description | reference | reference_type | space_id | type_id | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||
3 | tZCTk48f | TSP14 | None | None | None | None | 1 | None | 2 | 2023-07-14 21:27:44.312320+00:00 | 1 | None | 1 |
2 | gk6w8qC5 | TSP2 | None | None | None | None | 1 | None | 2 | 2023-07-14 21:27:44.312301+00:00 | 1 | None | 1 |
1 | vfLXaHgD | TSP1 | None | None | None | None | 1 | None | 2 | 2023-07-14 21:27:44.312230+00:00 | 1 | None | 1 |
******************
* module: bionty *
******************
CellType
uid | name | ontology_id | abbr | synonyms | description | space_id | source_id | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||
106 | 4qrbhCCl | respiratory ciliated cell | CL:4030034 | None | ciliated cell of the respiratory tract | A Ciliated Cell Of The Respiratory System. Cil... | 1 | 32 | 1 | 2025-01-17 14:20:27.392000+00:00 | 1 | None | 1 |
107 | 3hXuCKYH | perivascular cell | CL:4033054 | None | None | A Cell That Is Adjacent To A Vessel. A Perivas... | 1 | 32 | 1 | 2025-01-17 14:20:27.392000+00:00 | 1 | None | 1 |
35 | 4bKGljt0 | cell | CL:0000000 | None | None | A Material Entity Of Anatomical Origin (Part O... | 1 | 32 | 1 | 2025-01-17 14:20:27.391000+00:00 | 1 | None | 1 |
36 | jxDD8ajD | stem cell | CL:0000034 | None | animal stem cell | A Relatively Undifferentiated Cell That Retain... | 1 | 32 | 1 | 2025-01-17 14:20:27.391000+00:00 | 1 | None | 1 |
37 | 58RHatFb | single fate stem cell | CL:0000035 | None | unipotent stem cell|unipotential stem cell | A Stem Cell That Self-Renews As Well As Give R... | 1 | 32 | 1 | 2025-01-17 14:20:27.391000+00:00 | 1 | None | 1 |
38 | 5xUfhXf0 | epithelial fate stem cell | CL:0000036 | None | epithelial stem cell | None | 1 | 32 | 1 | 2025-01-17 14:20:27.391000+00:00 | 1 | None | 1 |
39 | 5vpYcwx1 | ciliated cell | CL:0000064 | None | None | A Cell That Has A Filiform Extrusion Of The Ce... | 1 | 32 | 1 | 2025-01-17 14:20:27.391000+00:00 | 1 | None | 1 |
ExperimentalFactor
uid | name | ontology_id | abbr | synonyms | description | molecule | instrument | measurement | space_id | source_id | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||
3 | 20Nq3k7b | disease | EFO:0000408 | None | disease or disorder|diseases|medical condition... | A Disease Is A Disposition To Undergo Patholog... | None | None | None | 1 | 67 | 1 | 2025-01-17 14:20:28.415000+00:00 | 1 | None | 1 |
4 | 6ISbvepx | nervous system disease | EFO:0000618 | None | nervous system disorder|neurologic disease|neu... | A Non-Neoplastic Or Neoplastic Disorder That A... | None | None | None | 1 | 67 | 1 | 2025-01-17 14:20:28.415000+00:00 | 1 | None | 1 |
5 | 2xDSpjH7 | cerebrovascular disorder | EFO:0003763 | None | Vascular Disorder, Intracranial|Cerebrovascula... | A Disorder Resulting From Inadequate Blood Flo... | None | None | None | 1 | 67 | 1 | 2025-01-17 14:20:28.415000+00:00 | 1 | None | 1 |
6 | 68LLeA7O | brain disease | EFO:0005774 | None | disorder of brain|disease or disorder of brain... | A Disease Affecting The Brain Or Part Of The B... | None | None | None | 1 | 67 | 1 | 2025-01-17 14:20:28.415000+00:00 | 1 | None | 1 |
7 | 2lctIHmn | central nervous system disease | EFO:0009386 | None | central nervous system disorder|central nervou... | A Disease Involving The Central Nervous System. | None | None | None | 1 | 67 | 1 | 2025-01-17 14:20:28.415000+00:00 | 1 | None | 1 |
8 | 1was9kRO | hypoxia | EFO:0009444 | None | None | A Decrease In The Amount Of Oxygen In The Body... | None | None | None | 1 | 67 | 1 | 2025-01-17 14:20:28.415000+00:00 | 1 | None | 1 |
1 | 5YDCOg0V | anoxya | EFO:0009445 | None | None | Absence Or Reduction Of Oxygen In Body Tissue.... | None | None | None | 1 | 67 | 1 | 2025-01-17 14:20:27.971000+00:00 | 1 | None | 1 |
Source
uid | entity | organism | name | in_db | currently_used | description | url | md5 | source_website | space_id | dataframe_artifact_id | version | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||
67 | 2a1H | bionty.ExperimentalFactor | all | efo | False | True | The Experimental Factor Ontology | http://www.ebi.ac.uk/efo/releases/v3.70.0/efo.owl | https://bioportal.bioontology.org/ontologies/EFO | 1 | None | 3.70.0 | None | 2025-01-17 14:20:14.175000+00:00 | 1 | None | 1 | |
53 | 5Xov | bionty.Disease | all | mondo | False | False | Mondo Disease Ontology | http://purl.obolibrary.org/obo/mondo/releases/... | 78914fa236773c5ea6605f7570df6245 | https://mondo.monarchinitiative.org | 1 | None | 2024-02-06 | None | 2025-01-17 14:20:14.175000+00:00 | 1 | None | 1 |
54 | 69ln | bionty.Disease | all | mondo | False | False | Mondo Disease Ontology | http://purl.obolibrary.org/obo/mondo/releases/... | 73787d81b885cfa1a255ee293e38303d | https://mondo.monarchinitiative.org | 1 | None | 2024-01-03 | None | 2025-01-17 14:20:14.175000+00:00 | 1 | None | 1 |
55 | 4ss2 | bionty.Disease | all | mondo | False | False | Mondo Disease Ontology | http://purl.obolibrary.org/obo/mondo/releases/... | 7f33767422042eec29f08b501fc851db | https://mondo.monarchinitiative.org | 1 | None | 2023-08-02 | None | 2025-01-17 14:20:14.175000+00:00 | 1 | None | 1 |
56 | Hgw0 | bionty.Disease | all | mondo | False | False | Mondo Disease Ontology | http://purl.obolibrary.org/obo/mondo/releases/... | 700c43dd9ba51aecc7a8edfc3bc2dab1 | https://mondo.monarchinitiative.org | 1 | None | 2023-04-04 | None | 2025-01-17 14:20:14.175000+00:00 | 1 | None | 1 |
57 | UUZU | bionty.Disease | all | mondo | False | False | Mondo Disease Ontology | http://purl.obolibrary.org/obo/mondo/releases/... | 2b7d479d4bd02a94eab47d1c9e64c5db | https://mondo.monarchinitiative.org | 1 | None | 2023-02-06 | None | 2025-01-17 14:20:14.175000+00:00 | 1 | None | 1 |
58 | 7DH1 | bionty.Disease | all | mondo | False | False | Mondo Disease Ontology | http://purl.obolibrary.org/obo/mondo/releases/... | 04b808d05c2c2e81430b20a0e87552bb | https://mondo.monarchinitiative.org | 1 | None | 2022-10-11 | None | 2025-01-17 14:20:14.175000+00:00 | 1 | None | 1 |
Tissue
uid | name | ontology_id | abbr | synonyms | description | space_id | source_id | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||
2 | 5SGM2iq3 | anatomical structure | UBERON:0000061 | None | biological structure|connected biological stru... | Material Anatomical Entity That Is A Single Co... | 1 | 41 | 1 | 2025-01-17 14:20:26.660000+00:00 | 1 | None | 1 |
3 | 7HJIkVT2 | organ | UBERON:0000062 | None | None | Anatomical Structure That Performs A Specific ... | 1 | 41 | 1 | 2025-01-17 14:20:26.660000+00:00 | 1 | None | 1 |
4 | IfUZr3lP | respiration organ | UBERON:0000171 | None | apparatus respiratorius organ|respiratory syst... | Organ That Functions In Gaseous Exchange Betwe... | 1 | 41 | 1 | 2025-01-17 14:20:26.660000+00:00 | 1 | None | 1 |
5 | N039zety | material anatomical entity | UBERON:0000465 | None | None | Anatomical Entity That Has Mass. | 1 | 41 | 1 | 2025-01-17 14:20:26.660000+00:00 | 1 | None | 1 |
6 | 6Kfy2VpY | anatomical system | UBERON:0000467 | None | connected anatomical system|organ system|body ... | Multicellular, Connected Anatomical Structure ... | 1 | 41 | 1 | 2025-01-17 14:20:26.660000+00:00 | 1 | None | 1 |
7 | 1boGOeV9 | multicellular organism | UBERON:0000468 | None | multi-cellular organism | Anatomical Structure That Is An Individual Mem... | 1 | 41 | 1 | 2025-01-17 14:20:26.660000+00:00 | 1 | None | 1 |
8 | 2Rmc7VPG | organism subdivision | UBERON:0000475 | None | anatomic region | Anatomical Structure Which Is A Subdivision Of... | 1 | 41 | 1 | 2025-01-17 14:20:26.660000+00:00 | 1 | None | 1 |
View lineage:
artifact.view_lineage()
The transferred dataset is linked to a special type of transform that stores the slug and uid of the source instance:
artifact.transform.description
'Transfer from `laminlabs/lamindata`'
The transform key has shape f"transfers/{source_instance.uid}"
:
artifact.transform.key
'transfers/4XIuR0tvaiXM'
The current notebook run is linked as the initiated_by_run of the “transfer run”:
artifact.run.initiated_by_run.transform
Transform(uid='ITeOtm7bhtdq0000', is_latest=True, key='transfer.ipynb', description='Transfer data', type='notebook', space_id=1, created_by_id=1, created_at=2025-01-17 14:20:21 UTC)
Show code cell content
# test the last 3 cells here
# TODO restore the following test
# assert artifact.transform.description == "Transfer from `laminlabs/lamindata`"
# assert artifact.transform.key == "transfers/4XIuR0tvaiXM"
# assert artifact.transform.uid == "4XIuR0tvaiXM0000"
# assert artifact.run.initiated_by_run.transform.description == "Transfer data"
# clean up test instance
!lamin delete --force test-transfer
! calling anonymously, will miss private instances
• deleting instance anonymous/test-transfer