Transfer data

This guide shows how to transfer data from a source database instance into the current default database instance.

# !pip install 'lamindb[jupyter,aws,bionty]'
!lamin init --storage ./test-transfer --modules bionty
Hide code cell output
! using anonymous user (to identify, call: lamin login)
 initialized lamindb: anonymous/test-transfer
import lamindb as ln

ln.track("ITeOtm7bhtdq0000")
Hide code cell output
 connected lamindb: anonymous/test-transfer
 created Transform('ITeOtm7bhtdq0000'), started new Run('8pzmfVcs...') at 2025-01-17 14:20:21 UTC
 notebook imports: lamindb==1.0rc1

Query all artifacts in the laminlabs/lamindata instance and filter them to their latest versions.

# query all latest artifact versions
artifacts = ln.Artifact.using("laminlabs/lamindata").filter(is_latest=True)

# convert the QuerySet to a DataFrame and show the latest 5 versions
artifacts.df().head()
Hide code cell output
! source modules has additional modules: {'wetlab'}
consider mounting these registry modules to transfer all metadata
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux _branch_code
id
1282 WQtsc0CQZKB9GEst0000 None Example R cars dataset .parquet dataset DataFrame 2402 eIk8NXNiwMoGmhhjrMILbg NaN NaN md5 True False 1 2 None None True 460.0 2025-01-15 14:22:51.192955+00:00 30 None 1
607 sRapK07mMtToihzFeTaf None View Papalexi21 in Vitessce .vitessce.json None None 1527 jfAtjNNzdvetUaEo5zhf0Q NaN NaN md5 True False 1 2 None None True 141.0 2024-04-30 12:51:16.348884+00:00 2 None 1
726 HXJ4DDAw8012jVKwoxgd None View Kuppe2022 in Vitessce .vitessce.json None None 5258 JsVK8X8EGRsyTEMnD3Z-6g NaN NaN md5 True False 1 2 None None True 198.0 2024-06-26 10:35:31.697669+00:00 2 None 1
895 nbX7Pk0SAPHNlsQD0000 devdata/params_2024-09-30_11-44-22.json None .json None None 38084 s6viX7LZ6KsjWcXigAn0eg NaN NaN md5 True False 1 2 None None True NaN 2024-10-02 15:25:49.609268+00:00 9 None 1
815 XmeH4JgiJFha7Nl90000 schmidt22_perturbseq/schmidt22_perturbseq.h5ad schmidt22 perturbseq counts .h5ad None AnnData 20659936 MwfMo7FUjrdk5mzTHx9RMw NaN NaN md5-n False False 1 2 None None True 377.0 2024-06-18 09:26:45.885472+00:00 2 None 1

You can now further subset or search the QuerySet. Here we query by whether the description contains “tabula sapiens”.

artifact = artifacts.filter(description__contains="Tabula Sapiens").first()
artifact.describe()
Hide code cell output
! source modules has additional modules: {'wetlab'}
consider mounting these registry modules to transfer all metadata
Artifact .h5ad
├── General
│   ├── .uid = 'dPraor9rU1EofcFb6Wph'
│   ├── .key = 'tabula_sapiens_lung.h5ad'
│   ├── .size = 3899435772
│   ├── .hash = '8mB1KK2wd51F6HQdvqipcQ'
│   ├── .path = s3://lamindata/tabula_sapiens_lung.h5ad
│   ├── .created_by = Koncopd (Sergei Rybakov)
│   ├── .created_at = 2023-07-14 19:00:30
│   └── .transform = 'Ingest Tabula Sapiens Lung'
└── Labels
    └── .tissues                    bionty.Tissue              lung                                     
        .cell_types                 bionty.CellType            myofibroblast cell, B cell, capillary ae…
        .experimental_factors       bionty.ExperimentalFactor  anoxya, stroke                           
        .ulabels                    ULabel                     TSP1, TSP2, TSP14                        

By saving the artifact record that’s currently attached to the source database instance, you transfer it to the default database instance.

artifact.save()
Hide code cell output
/home/runner/work/lamindb/lamindb/lamindb/_record.py:642: FutureWarning: `name` will be removed soon, please pass 'Transfer from `laminlabs/lamindata`' to `description` instead
  transform = Transform(
 mapped records: Tissue(uid='7Tt4iEKc'), CellType(uid='5tiBvp96'), CellType(uid='7Crr32HI'), CellType(uid='6dzoXJ3Y'), CellType(uid='01NqvhnI'), CellType(uid='5NceZTYm'), CellType(uid='4PSMdO3I'), CellType(uid='3JO0EdVd'), CellType(uid='6rfrjhvo'), CellType(uid='37mWPv6o'), CellType(uid='5Z76sCep'), CellType(uid='2OWUH6Z1'), CellType(uid='5TU8SFt5'), CellType(uid='ryEtgi1y'), CellType(uid='1lMgAPE8'), CellType(uid='7m6Ruz32'), CellType(uid='42qbvc90'), CellType(uid='puGNwNrs'), CellType(uid='1T8bGe2I'), CellType(uid='6IC9NGJE'), CellType(uid='6ujMwy7s'), CellType(uid='3eecYgWR'), CellType(uid='zQ4dyjEs'), CellType(uid='7mNqzyFE'), CellType(uid='5A9EFjNB'), CellType(uid='3lsrLTv6'), CellType(uid='1HYtHpIc'), CellType(uid='6UmKFrzn'), CellType(uid='7eZArDpo'), CellType(uid='2KCFdGIk'), CellType(uid='1V5wVqK5'), CellType(uid='5i19XYug'), CellType(uid='2nPA0h4F'), CellType(uid='5Xi2OLvZ'), CellType(uid='3kaL3W1c'), ExperimentalFactor(uid='5YDCOg0V'), ExperimentalFactor(uid='7R1OhRJ7')
 transferred records: Artifact(uid='dPraor9rU1EofcFb6Wph'), Storage(uid='D9BilDV2'), CellType(uid='4mZaXZQg'), CellType(uid='5rVn0X39'), CellType(uid='EWy46Sey'), CellType(uid='4yqLzwwm'), ULabel(uid='vfLXaHgD'), ULabel(uid='gk6w8qC5'), ULabel(uid='tZCTk48f')
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, key='tabula_sapiens_lung.h5ad', description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', space_id=1, storage_id=2, run_id=2, created_by_id=1, created_at=2023-07-14 19:00:30 UTC)
How do I know if a record is saved in the default database instance or not?

Every record has an attribute ._state.db which can take the following values:

  • None: the record has not yet been saved to any database

  • "default": the record is saved on the default database instance

  • "account/name": the record is saved on a non-default database instance referenced by account/name (e.g., laminlabs/lamindata)

The artifact record and all other feature & label records have been transferred to the current database.

artifact.describe()
Hide code cell output
Artifact .h5ad
├── General
│   ├── .uid = 'dPraor9rU1EofcFb6Wph'
│   ├── .key = 'tabula_sapiens_lung.h5ad'
│   ├── .size = 3899435772
│   ├── .hash = '8mB1KK2wd51F6HQdvqipcQ'
│   ├── .path = s3://lamindata/tabula_sapiens_lung.h5ad
│   ├── .created_by = anonymous
│   ├── .created_at = 2023-07-14 19:00:30
│   └── .transform = 'Transfer from `laminlabs/lamindata`'
└── Labels
    └── .tissues                    bionty.Tissue              lung                                     
        .cell_types                 bionty.CellType            pulmonary alveolar type 1 cell, adventit…
        .experimental_factors       bionty.ExperimentalFactor  anoxya, stroke                           
        .ulabels                    ULabel                     TSP1, TSP2, TSP14                        

You see that the data itself remained in the original storage location, which has been added to the current instance’s storage location as a read-only location.

ln.Storage.df()
Hide code cell output
uid root description type region instance_uid space_id run_id created_at created_by_id _aux _branch_code
id
1 JnmScCp9QIoG /home/runner/work/lamindb/lamindb/docs/test-tr... None local None 1FHu5eE0uxm4 1 NaN 2025-01-17 14:20:13.937000+00:00 1 None 1
2 D9BilDV2 s3://lamindata None s3 us-east-1 4XIuR0tvaiXM 1 2.0 2023-04-22 05:50:06.537267+00:00 1 None 1

See the state of the database.

ln.view()
Hide code cell output
****************
* module: core *
****************
Artifact
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux _branch_code
id
1 dPraor9rU1EofcFb6Wph tabula_sapiens_lung.h5ad Part of Tabula Sapiens, a benchmark, first-dra... .h5ad None None 3899435772 8mB1KK2wd51F6HQdvqipcQ None None sha1-fl False False 1 2 None None True 2 2023-07-14 19:00:30.621330+00:00 1 None 1
Run
uid name started_at finished_at reference reference_type _is_consecutive _status_code space_id transform_id report_id _logfile_id environment_id initiated_by_run_id created_at created_by_id _aux _branch_code
id
1 8pzmfVcsFRu5pDg8g5zd None 2025-01-17 14:20:21.124334+00:00 None None None None 0 1 1 None None None NaN 2025-01-17 14:20:21.124386+00:00 1 None 1
2 0NmOCMX8i2P1JpVPvVPT None 2025-01-17 14:20:25.505404+00:00 None None None None 0 1 2 None None None 1.0 2025-01-17 14:20:25.505453+00:00 1 None 1
Storage
uid root description type region instance_uid space_id run_id created_at created_by_id _aux _branch_code
id
1 JnmScCp9QIoG /home/runner/work/lamindb/lamindb/docs/test-tr... None local None 1FHu5eE0uxm4 1 NaN 2025-01-17 14:20:13.937000+00:00 1 None 1
2 D9BilDV2 s3://lamindata None s3 us-east-1 4XIuR0tvaiXM 1 2.0 2023-04-22 05:50:06.537267+00:00 1 None 1
Transform
uid key description type source_code hash reference reference_type space_id _template_id version is_latest created_at created_by_id _aux _branch_code
id
2 4XIuR0tvaiXM0000 transfers/4XIuR0tvaiXM Transfer from `laminlabs/lamindata` function None None None None 1 None None True 2025-01-17 14:20:25.495824+00:00 1 None 1
1 ITeOtm7bhtdq0000 transfer.ipynb Transfer data notebook None None None None 1 None None True 2025-01-17 14:20:21.116237+00:00 1 None 1
ULabel
uid name is_type description reference reference_type space_id type_id run_id created_at created_by_id _aux _branch_code
id
3 tZCTk48f TSP14 None None None None 1 None 2 2023-07-14 21:27:44.312320+00:00 1 None 1
2 gk6w8qC5 TSP2 None None None None 1 None 2 2023-07-14 21:27:44.312301+00:00 1 None 1
1 vfLXaHgD TSP1 None None None None 1 None 2 2023-07-14 21:27:44.312230+00:00 1 None 1
******************
* module: bionty *
******************
CellType
uid name ontology_id abbr synonyms description space_id source_id run_id created_at created_by_id _aux _branch_code
id
106 4qrbhCCl respiratory ciliated cell CL:4030034 None ciliated cell of the respiratory tract A Ciliated Cell Of The Respiratory System. Cil... 1 32 1 2025-01-17 14:20:27.392000+00:00 1 None 1
107 3hXuCKYH perivascular cell CL:4033054 None None A Cell That Is Adjacent To A Vessel. A Perivas... 1 32 1 2025-01-17 14:20:27.392000+00:00 1 None 1
35 4bKGljt0 cell CL:0000000 None None A Material Entity Of Anatomical Origin (Part O... 1 32 1 2025-01-17 14:20:27.391000+00:00 1 None 1
36 jxDD8ajD stem cell CL:0000034 None animal stem cell A Relatively Undifferentiated Cell That Retain... 1 32 1 2025-01-17 14:20:27.391000+00:00 1 None 1
37 58RHatFb single fate stem cell CL:0000035 None unipotent stem cell|unipotential stem cell A Stem Cell That Self-Renews As Well As Give R... 1 32 1 2025-01-17 14:20:27.391000+00:00 1 None 1
38 5xUfhXf0 epithelial fate stem cell CL:0000036 None epithelial stem cell None 1 32 1 2025-01-17 14:20:27.391000+00:00 1 None 1
39 5vpYcwx1 ciliated cell CL:0000064 None None A Cell That Has A Filiform Extrusion Of The Ce... 1 32 1 2025-01-17 14:20:27.391000+00:00 1 None 1
ExperimentalFactor
uid name ontology_id abbr synonyms description molecule instrument measurement space_id source_id run_id created_at created_by_id _aux _branch_code
id
3 20Nq3k7b disease EFO:0000408 None disease or disorder|diseases|medical condition... A Disease Is A Disposition To Undergo Patholog... None None None 1 67 1 2025-01-17 14:20:28.415000+00:00 1 None 1
4 6ISbvepx nervous system disease EFO:0000618 None nervous system disorder|neurologic disease|neu... A Non-Neoplastic Or Neoplastic Disorder That A... None None None 1 67 1 2025-01-17 14:20:28.415000+00:00 1 None 1
5 2xDSpjH7 cerebrovascular disorder EFO:0003763 None Vascular Disorder, Intracranial|Cerebrovascula... A Disorder Resulting From Inadequate Blood Flo... None None None 1 67 1 2025-01-17 14:20:28.415000+00:00 1 None 1
6 68LLeA7O brain disease EFO:0005774 None disorder of brain|disease or disorder of brain... A Disease Affecting The Brain Or Part Of The B... None None None 1 67 1 2025-01-17 14:20:28.415000+00:00 1 None 1
7 2lctIHmn central nervous system disease EFO:0009386 None central nervous system disorder|central nervou... A Disease Involving The Central Nervous System. None None None 1 67 1 2025-01-17 14:20:28.415000+00:00 1 None 1
8 1was9kRO hypoxia EFO:0009444 None None A Decrease In The Amount Of Oxygen In The Body... None None None 1 67 1 2025-01-17 14:20:28.415000+00:00 1 None 1
1 5YDCOg0V anoxya EFO:0009445 None None Absence Or Reduction Of Oxygen In Body Tissue.... None None None 1 67 1 2025-01-17 14:20:27.971000+00:00 1 None 1
Source
uid entity organism name in_db currently_used description url md5 source_website space_id dataframe_artifact_id version run_id created_at created_by_id _aux _branch_code
id
67 2a1H bionty.ExperimentalFactor all efo False True The Experimental Factor Ontology http://www.ebi.ac.uk/efo/releases/v3.70.0/efo.owl https://bioportal.bioontology.org/ontologies/EFO 1 None 3.70.0 None 2025-01-17 14:20:14.175000+00:00 1 None 1
53 5Xov bionty.Disease all mondo False False Mondo Disease Ontology http://purl.obolibrary.org/obo/mondo/releases/... 78914fa236773c5ea6605f7570df6245 https://mondo.monarchinitiative.org 1 None 2024-02-06 None 2025-01-17 14:20:14.175000+00:00 1 None 1
54 69ln bionty.Disease all mondo False False Mondo Disease Ontology http://purl.obolibrary.org/obo/mondo/releases/... 73787d81b885cfa1a255ee293e38303d https://mondo.monarchinitiative.org 1 None 2024-01-03 None 2025-01-17 14:20:14.175000+00:00 1 None 1
55 4ss2 bionty.Disease all mondo False False Mondo Disease Ontology http://purl.obolibrary.org/obo/mondo/releases/... 7f33767422042eec29f08b501fc851db https://mondo.monarchinitiative.org 1 None 2023-08-02 None 2025-01-17 14:20:14.175000+00:00 1 None 1
56 Hgw0 bionty.Disease all mondo False False Mondo Disease Ontology http://purl.obolibrary.org/obo/mondo/releases/... 700c43dd9ba51aecc7a8edfc3bc2dab1 https://mondo.monarchinitiative.org 1 None 2023-04-04 None 2025-01-17 14:20:14.175000+00:00 1 None 1
57 UUZU bionty.Disease all mondo False False Mondo Disease Ontology http://purl.obolibrary.org/obo/mondo/releases/... 2b7d479d4bd02a94eab47d1c9e64c5db https://mondo.monarchinitiative.org 1 None 2023-02-06 None 2025-01-17 14:20:14.175000+00:00 1 None 1
58 7DH1 bionty.Disease all mondo False False Mondo Disease Ontology http://purl.obolibrary.org/obo/mondo/releases/... 04b808d05c2c2e81430b20a0e87552bb https://mondo.monarchinitiative.org 1 None 2022-10-11 None 2025-01-17 14:20:14.175000+00:00 1 None 1
Tissue
uid name ontology_id abbr synonyms description space_id source_id run_id created_at created_by_id _aux _branch_code
id
2 5SGM2iq3 anatomical structure UBERON:0000061 None biological structure|connected biological stru... Material Anatomical Entity That Is A Single Co... 1 41 1 2025-01-17 14:20:26.660000+00:00 1 None 1
3 7HJIkVT2 organ UBERON:0000062 None None Anatomical Structure That Performs A Specific ... 1 41 1 2025-01-17 14:20:26.660000+00:00 1 None 1
4 IfUZr3lP respiration organ UBERON:0000171 None apparatus respiratorius organ|respiratory syst... Organ That Functions In Gaseous Exchange Betwe... 1 41 1 2025-01-17 14:20:26.660000+00:00 1 None 1
5 N039zety material anatomical entity UBERON:0000465 None None Anatomical Entity That Has Mass. 1 41 1 2025-01-17 14:20:26.660000+00:00 1 None 1
6 6Kfy2VpY anatomical system UBERON:0000467 None connected anatomical system|organ system|body ... Multicellular, Connected Anatomical Structure ... 1 41 1 2025-01-17 14:20:26.660000+00:00 1 None 1
7 1boGOeV9 multicellular organism UBERON:0000468 None multi-cellular organism Anatomical Structure That Is An Individual Mem... 1 41 1 2025-01-17 14:20:26.660000+00:00 1 None 1
8 2Rmc7VPG organism subdivision UBERON:0000475 None anatomic region Anatomical Structure Which Is A Subdivision Of... 1 41 1 2025-01-17 14:20:26.660000+00:00 1 None 1

View lineage:

artifact.view_lineage()
_images/506f29aee81b3f23eb9543f1c1e15f27ba8df549989bd3d4a09299fec7670c7f.svg

The transferred dataset is linked to a special type of transform that stores the slug and uid of the source instance:

artifact.transform.description
'Transfer from `laminlabs/lamindata`'

The transform key has shape f"transfers/{source_instance.uid}":

artifact.transform.key
'transfers/4XIuR0tvaiXM'

The current notebook run is linked as the initiated_by_run of the “transfer run”:

artifact.run.initiated_by_run.transform
Transform(uid='ITeOtm7bhtdq0000', is_latest=True, key='transfer.ipynb', description='Transfer data', type='notebook', space_id=1, created_by_id=1, created_at=2025-01-17 14:20:21 UTC)
Hide code cell content
# test the last 3 cells here
# TODO restore the following test
# assert artifact.transform.description == "Transfer from `laminlabs/lamindata`"
# assert artifact.transform.key == "transfers/4XIuR0tvaiXM"
# assert artifact.transform.uid == "4XIuR0tvaiXM0000"
# assert artifact.run.initiated_by_run.transform.description == "Transfer data"

# clean up test instance
!lamin delete --force test-transfer
! calling anonymously, will miss private instances
 deleting instance anonymous/test-transfer