Data Models

These model classes are used to represent objects on the Blackfynn platform. Briefly, there are three major classes of entities: “collection” classes, “data” classes, and “detail” or “helper” classes.

Base

BaseDataNode

Base class to serve all “data” node-types on platform, e.g.

Data Catalog Basics

Dataset

Collection

DataPackage

DataPackage is the core data object representation on the platform.

Time Series

TimeSeries

Represents a timeseries package on the platform.

TimeSeriesChannel

TimeSeriesChannel represents a single source of time series data.

TimeSeriesAnnotationLayer

Annotation layer containing one or more annotations.

TimeSeriesAnnotation

Annotation is an event on one or more channels in a dataset

Tabular

Tabular

Represents a Tabular package on the platform.

Base Class

The BaseDataNode class provides the basic methods available on all models listed below.

class blackfynn.models.BaseDataNode(name, type, parent=None, owner_id=None, dataset_id=None, id=None, provenance_id=None, **kwargs)[source]

Base class to serve all “data” node-types on platform, e.g. Packages and Collections.

delete()[source]

Delete object from platform.

get_property(key, category='Blackfynn')[source]

Returns a single property for the provided key, if available

Parameters
  • key (str) – key of the desired property

  • category (str, optional) – category of property

Returns

object of type Property

Example:

pkg.set_property('quality', 85.0)
pkg.get_property('quality')
remove_property(key, category='Blackfynn')[source]

Removes property of key key and category category from the object.

Parameters
  • key (str) – key of property to remove

  • category (str, optional) – category of property to remove

set_error()[source]

Set’s the package’s state to ERROR

set_property(key, value, fixed=False, hidden=False, category='Blackfynn', data_type=None)[source]

Add property to object using simplified interface.

Parameters
  • key (str) – the key of the property

  • value (str,number) – the value of the property

  • fixed (bool) – if true, the value cannot be changed after the property is created

  • hidden (bool) – if true, the value is hidden on the platform

  • category (str) – the category of the property, default: “Blackfynn”

  • data_type (str) – one of ‘string’, ‘integer’, ‘double’, ‘date’, ‘user’

set_ready(**kwargs)[source]

Set’s the package’s state to READY

set_unavailable()[source]

Set’s the package’s state to UNAVAILABLE

update(**kwargs)[source]

Updates object on the platform (with any local changes) and syncs local instance with API response object.

Exmple:

pkg = bf.get('N:package:1234-1234-1234-1234')
pkg.name = "New name"
pkg.update()
exists

Whether or not the instance of this object exists on the platform.

properties

Returns a list of properties attached to object.

Data Catalog Basics

Note:

A useful special method for the following classes is __contains__, which enables you to do:

if my_pkg in my_collection:
   print("the package", pkg, "is in the collection")

Dataset

Datasets are core entities on the Blackfynn platform. All data must be placed in a Dataset, whether directly or nested. Datasets can be thought of as similar to “repositories” in GitHub; they exist directly underneath a user/organization, and all sharing is controlled from their level.

class blackfynn.models.Dataset(name, description=None, status=None, automatically_process_packages=False, **kwargs)[source]
add(*items)

Add items to the Collection/Dataset.

create_collection(name)

Create a new collection within the current object. Collections can be created within datasets and within other collections.

Parameters

name (str) – The name of the to-be-created collection

Returns

The created Collection object.

Example:

from blackfynn import Blackfynn()

bf = Blackfynn()
ds = bf.get_dataset('my_dataset')

# create collection in dataset
col1 = ds.create_collection('my_collection')

# create collection in collection
col2 = col1.create_collection('another_collection')
create_model(name, display_name=None, description=None, schema=None, **kwargs)[source]

Defines a Model on the platform.

Parameters
  • name (str) – Name of the model

  • description (str, optional) – Description of the model

  • schema (list, optional) – Definition of the model’s schema as list of ModelProperty objects.

Returns

The newly created Model

Note

It is required that a model includes at least _one_ property that serves as the “title”.

Example

Create a participant model, including schema:

from blackfynn import ModelProperty

ds.create_model('participant',
    description = 'a human participant in a research study',
    schema = [
        ModelProperty('name', data_type=str, title=True),
        ModelProperty('age',  data_type=int)
    ]
)

Or define schema using dictionary:

ds.create_model('participant',
    schema = [
        {
            'name': 'full_name',
            'data_type': str,
            'title': True
        },
        {
            'name': 'age',
            'data_type': int,
        }
])

You can also create a model and define schema later:

# create model
pt = ds.create_model('participant')

# define schema
pt.add_property('name', str, title=True)
pt.add_property('age', int)
create_relationship_type(name, description, schema=None, **kwargs)[source]

Defines a RelationshipType on the platform.

Parameters
  • name (str) – name of the relationship

  • description (str) – description of the relationship

  • schema (dict, optional) – definitation of the relationship’s schema

Returns

The newly created RelationshipType

Example:

ds.create_relationship_type('belongs-to', 'this belongs to that')
delete()

Delete object from platform.

get_connected_models(name_or_id)[source]

Retrieve all models connected to the given model

Connected is defined as model that can be reached by following outgoing relationships starting at the current model

Parameters

name_or_id – Name or id of the model

Returns

List of Model objects

Example::

connected_models = ds.get_related_models(‘patient’)

get_graph_summary()[source]

Returns summary metrics about the knowledge graph

get_items_by_name(name)

Get an item inside of object by name (if match is found).

Parameters

name (str) – the name of the item

Returns

list of matches

Note

This only works for first-level items, meaning it must exist directly inside the current object; nested items will not be returned.

get_model(name_or_id)[source]

Retrieve a Model by name or id

Parameters

name_or_id (str or int) – name or id of the model

Returns

The requested Model in Dataset

Example:

mouse = ds.get_model('mouse')
get_property(key, category='Blackfynn')

Returns a single property for the provided key, if available

Parameters
  • key (str) – key of the desired property

  • category (str, optional) – category of property

Returns

object of type Property

Example:

pkg.set_property('quality', 85.0)
pkg.get_property('quality')
get_relationship(name_or_id)[source]

Retrieve a RelationshipType by name or id

Parameters

name_or_id (str or int) – name or id of the relationship

Returns

The requested RelationshipType

Example:

belongsTo = ds.get_relationship('belongs-to')
get_topology()[source]

Returns the set of Models and Relationships defined for the dataset

Returns

Keys are either models or relationships. Values are the list of objects of that type

Return type

dict

import_model(template)[source]

Imports a model based on the given template into the dataset

Parameters

template (ModelTemplate) – the ModelTemplate to import

Returns

A list of ModelProperty objects that have been imported into the dataset

models()[source]
Returns

List of models defined in Dataset

print_tree(indent=0)

Prints a tree of all items inside object.

relationships()[source]
Returns

List of relationships defined in Dataset

remove(*items)

Removes items, where items can be an object or the object’s ID (string).

remove_collaborators(*collaborator_ids)[source]

Remove collaborator(s) from Dataset.

Parameters

collaborator_ids – List of collaborator IDs to remove (Users)

remove_property(key, category='Blackfynn')

Removes property of key key and category category from the object.

Parameters
  • key (str) – key of property to remove

  • category (str, optional) – category of property to remove

set_property(key, value, fixed=False, hidden=False, category='Blackfynn', data_type=None)

Add property to object using simplified interface.

Parameters
  • key (str) – the key of the property

  • value (str,number) – the value of the property

  • fixed (bool) – if true, the value cannot be changed after the property is created

  • hidden (bool) – if true, the value is hidden on the platform

  • category (str) – the category of the property, default: “Blackfynn”

  • data_type (str) – one of ‘string’, ‘integer’, ‘double’, ‘date’, ‘user’

update(**kwargs)

Updates object on the platform (with any local changes) and syncs local instance with API response object.

Exmple:

pkg = bf.get('N:package:1234-1234-1234-1234')
pkg.name = "New name"
pkg.update()
upload(*files, **kwargs)

Upload files into current object.

Parameters

files – list of local files to upload. If the Blackfynn CLI Agent is installed you can also upload a directory. See Using the Blackfynn CLI Agent for more information.

Keyword Arguments
  • display_progress (boolean) – If True, a progress bar will be shown to track upload progress. Defaults to False.

  • use_agent (boolean) – If True, and a compatible version of the Agent is installed, uploads will be performed by the Blackfynn CLI Agent. This allows large file upload in excess of 1 hour. Defaults to False.

  • recursive (boolean) – If True, the nested folder structure of the uploaded directory will be preversed. This can only be used with the Blackfynn CLI Agent. Defaults to False.

Example:

my_collection.upload('/path/to/file1.nii.gz', '/path/to/file2.pdf')
collaborators

List of collaborators on Dataset.

exists

Whether or not the instance of this object exists on the platform.

items

Get all items inside Dataset/Collection (i.e. non-nested items).

Note

You can also iterate over items inside a Dataset/Colleciton without using .items:

for item in my_dataset:
    print("item name = ", item.name)
properties

Returns a list of properties attached to object.

Collection

Collections are collections of data that exist inside of a Dataset. These can be thought of as simililar to a folder or directory.

class blackfynn.models.Collection(name, **kwargs)[source]
add(*items)

Add items to the Collection/Dataset.

create_collection(name)

Create a new collection within the current object. Collections can be created within datasets and within other collections.

Parameters

name (str) – The name of the to-be-created collection

Returns

The created Collection object.

Example:

from blackfynn import Blackfynn()

bf = Blackfynn()
ds = bf.get_dataset('my_dataset')

# create collection in dataset
col1 = ds.create_collection('my_collection')

# create collection in collection
col2 = col1.create_collection('another_collection')
delete()

Delete object from platform.

get_items_by_name(name)

Get an item inside of object by name (if match is found).

Parameters

name (str) – the name of the item

Returns

list of matches

Note

This only works for first-level items, meaning it must exist directly inside the current object; nested items will not be returned.

get_property(key, category='Blackfynn')

Returns a single property for the provided key, if available

Parameters
  • key (str) – key of the desired property

  • category (str, optional) – category of property

Returns

object of type Property

Example:

pkg.set_property('quality', 85.0)
pkg.get_property('quality')
print_tree(indent=0)

Prints a tree of all items inside object.

remove(*items)

Removes items, where items can be an object or the object’s ID (string).

remove_property(key, category='Blackfynn')

Removes property of key key and category category from the object.

Parameters
  • key (str) – key of property to remove

  • category (str, optional) – category of property to remove

set_property(key, value, fixed=False, hidden=False, category='Blackfynn', data_type=None)

Add property to object using simplified interface.

Parameters
  • key (str) – the key of the property

  • value (str,number) – the value of the property

  • fixed (bool) – if true, the value cannot be changed after the property is created

  • hidden (bool) – if true, the value is hidden on the platform

  • category (str) – the category of the property, default: “Blackfynn”

  • data_type (str) – one of ‘string’, ‘integer’, ‘double’, ‘date’, ‘user’

update(**kwargs)

Updates object on the platform (with any local changes) and syncs local instance with API response object.

Exmple:

pkg = bf.get('N:package:1234-1234-1234-1234')
pkg.name = "New name"
pkg.update()
upload(*files, **kwargs)

Upload files into current object.

Parameters

files – list of local files to upload. If the Blackfynn CLI Agent is installed you can also upload a directory. See Using the Blackfynn CLI Agent for more information.

Keyword Arguments
  • display_progress (boolean) – If True, a progress bar will be shown to track upload progress. Defaults to False.

  • use_agent (boolean) – If True, and a compatible version of the Agent is installed, uploads will be performed by the Blackfynn CLI Agent. This allows large file upload in excess of 1 hour. Defaults to False.

  • recursive (boolean) – If True, the nested folder structure of the uploaded directory will be preversed. This can only be used with the Blackfynn CLI Agent. Defaults to False.

Example:

my_collection.upload('/path/to/file1.nii.gz', '/path/to/file2.pdf')
exists

Whether or not the instance of this object exists on the platform.

items

Get all items inside Dataset/Collection (i.e. non-nested items).

Note

You can also iterate over items inside a Dataset/Colleciton without using .items:

for item in my_dataset:
    print("item name = ", item.name)
properties

Returns a list of properties attached to object.

Data Package

The DataPackage class is used for all non-specific data classes (i.e. classes that do not need specialized methods).

class blackfynn.models.DataPackage(name, package_type, **kwargs)[source]

DataPackage is the core data object representation on the platform.

Parameters
  • name (str) – The name of the data package

  • package_type (str) – The package type, e.g. ‘TimeSeries’, ‘MRI’, etc.

Note

package_type must be a supported package type. See our data type registry for supported values.

process()[source]

Process a data package that has successfully uploaded it’s source files but has not yet been processed by the Blackfynn ETL.

relate_to(*records)[source]

Relate current DataPackage to one or more Record objects.

Parameters

records (list of Records) – Records to relate to data package

Returns

Relationship that defines the link

Example

Relate package to a single record:

eeg.relate_to(participant_123)

Relate package to multiple records:

# relate to explicit list of records
eeg.relate_to(
    participant_001
    participant_002,
    participant_003,
)

# relate to all participants
eeg.relate_to(participants.get_all())

Note

The created relationship will be of the form DataPackage –(belongs_to)–> Record.

files

Returns the files of a DataPackage. Files are the possibly modified source files (e.g. converted to a different format), but they could also be the source files themselves.

sources

Returns the sources of a DataPackage. Sources are the raw, unmodified files (if they exist) that contains the package’s data.

view

Returns the object(s) used to view the package. This is typically a set of file objects, that may be the DataPackage’s sources or files, but could also be a unique object specific for the viewer.

Data-Specific Classes

Timeseries

class blackfynn.models.TimeSeries(name, **kwargs)[source]

Bases: blackfynn.models.DataPackage

Represents a timeseries package on the platform. TimeSeries packages contain channels, which contain time-dependent data sampled at some frequency.

Parameters

name – The name of the timeseries package

add_annotations(layer, annotations)[source]
Parameters
  • layer – either TimeSeriesAnnotationLayer object or name of annotation layer. Note that non existing layers will be created.

  • annotations – TimeSeriesAnnotation object(s)

Returns

list of TimeSeriesAnnotation objects

add_channels(*channels)[source]

Add channels to TimeSeries package.

Parameters

channels – list of Channel objects.

add_layer(layer, description=None)[source]
Parameters
  • layer – TimeSeriesAnnotationLayer object or name of annotation layer

  • description (str, optional) – description of layer

annotation_counts(start, end, layers, period, channels=None)[source]

Get annotation counts between start and end.

Parameters
  • start (datetime or microseconds) – The starting time of the range to query

  • end (datetime or microseconds) – The ending time of the the range to query

  • layers ([TimeSeriesLayer]) – List of layers for which to count annotations

  • period (string) – The length of time to group the counts. Formatted as a string - e.g. ‘1s’, ‘5m’, ‘3h’

  • channels ([TimeSeriesChannel]) – List of channel (if omitted, all channels will be used)

append_annotation_file(file)[source]

Processes .bfannot file and adds to timeseries package.

Parameters

file – path to .bfannot file

append_files(*files, **kwargs)[source]

Append files to this timeseries package.

Parameters

files – list of local files to upload.

Keyword Arguments
  • display_progress (boolean) – If True, a progress bar will be shown to track upload progress. Defaults to False.

  • use_agent (boolean) – If True, and a compatible version of the Agent is installed, uploads will be performed by the Blackfynn CLI Agent. This allows large file upload in excess of 1 hour. Defaults to False.

delete_layer(layer)[source]

Delete annotation layer.

Parameters

layer – annotation layer object

get_channel(channel)[source]

Get channel by ID.

Parameters

channel (str) – ID of channel

get_data(start=None, end=None, length=None, channels=None, use_cache=True)[source]

Get timeseries data between start and end or start and start + length on specified channels (default all channels).

Parameters
  • start (optional) – start time of data (usecs or datetime object)

  • end (optional) – end time of data (usecs or datetime object)

  • length (optional) – length of data to retrieve, e.g. ‘1s’, ‘5s’, ‘10m’, ‘1h’

  • channels (optional) – list of channel objects or IDs, default all channels.

Note

Data requests will be automatically chunked and combined into a single Pandas DataFrame. However, you must be sure you request only a span of data that will properly fit in memory.

See get_data_iter for an iterator approach to timeseries data retrieval.

Example

Get 5 seconds of data from start over all channels:

data = ts.get_data(length='5s')

Get data betwen 12345 and 56789 (representing usecs since Epoch):

data = ts.get_data(start=12345, end=56789)

Get first 10 seconds for the first two channels:

data = ts.get_data(length='10s', channels=ts.channels[:2])
get_data_iter(channels=None, start=None, end=None, length=None, chunk_size=None, use_cache=True)[source]

Returns iterator over the data. Must specify either ``end`` OR ``length``, not both.

Parameters
  • channels (optional) – channels to retrieve data for (default: all)

  • start – start time of data (default: earliest time available).

  • end – end time of data (default: latest time avialable).

  • length – some time length, e.g. ‘1s’, ‘5m’, ‘1h’ or number of usecs

  • chunk – some time length, e.g. ‘1s’, ‘5m’, ‘1h’ or number of usecs

Returns

iterator of Pandas Series, each the size of chunk_size.

get_layer(id_or_name)[source]

Get annotation layer by ID or name.

Parameters

id_or_name – layer ID or name

insert_annotation(layer, annotation, start=None, end=None, channel_ids=None, annotation_description=None)[source]

Insert annotations using a more direct interface, without the need for layer/annotation objects.

Parameters
  • layer – str of new/existing layer or annotation layer object

  • annotation – str of annotation event

  • start (optional) – start of annotation

  • end (optional) – end of annotation

  • channels_ids (optional) – list of channel IDs to apply annotation

  • annotation_description (optional) – description of annotation

Example

To add annotation on layer “my-events” across all channels:

ts.insert_annotation('my-events', 'my annotation event')

To add annotation to first channel:

ts.insert_annotation('my-events', 'first channel event', channel_ids=ts.channels[0])
limits()[source]

Returns time limit tuple (start, end) of package.

remove_channels(*channels)[source]

Remove channels from TimeSeries package.

Parameters

channels – list of Channel objects or IDs

segments(start=None, stop=None, gap_factor=2)[source]

Returns list of contiguous data segments available for package. Segments are assesssed for all channels, and the union of segments is returned.

Parameters
  • start (int, datetime, optional) – Return segments starting after this time (default earliest start of any channel)

  • stop (int, datetime, optional) – Return segments starting before this time (default latest end time of any channel)

  • gap_factor (int, optional) – Gaps are computed by sampling_rate * gap_factor (default 2)

Returns

List of tuples, where each tuple represents the (start, stop) of contiguous data.

write_annotation_file(file, layer_names=None)[source]

Writes all layers to a csv .bfannot file

Parameters
  • file – path to .bfannot output file. Appends extension if necessary

  • layer_names (optional) – List of layer names to write

channels

Returns list of Channel objects associated with package.

Note

This is a dynamically generated property, so every call will make an API request.

Suggested usage:

channels = ts.channels
for ch in channels:
    print(ch)

This will be much slower, as the API request is being made each time.:

for ch in ts.channels:
    print(ch)
end

The end time (in usecs) of time series data (over all channels)

layers

List of annotation layers attached to TimeSeries package.

start

The start time of time series data (over all channels)

class blackfynn.models.TimeSeriesChannel(name, rate, start=0, end=0, unit='V', channel_type='continuous', source_type='unspecified', group='default', last_annot=0, spike_duration=None, **kwargs)[source]

Bases: blackfynn.models.BaseDataNode

TimeSeriesChannel represents a single source of time series data. (e.g. electrode)

Parameters
  • name (str) – Name of channel

  • rate (float) – Rate of the channel (Hz)

  • start (optional) – Absolute start time of all data (datetime obj)

  • end (optional) – Absolute end time of all data (datetime obj)

  • unit (str, optional) – Unit of measurement

  • channel_type (str, optional) – One of ‘continuous’ or ‘event’

  • source_type (str, optional) – The source of data, e.g. “EEG”

  • group (str, optional) – The channel group, default: “default”

get_data(start=None, end=None, length=None, use_cache=True)[source]

Get channel data between start and end or start and start + length

Parameters
  • start (optional) – start time of data (usecs or datetime object)

  • end (optional) – end time of data (usecs or datetime object)

  • length (optional) – length of data to retrieve, e.g. ‘1s’, ‘5s’, ‘10m’, ‘1h’

  • use_cache (optional) – whether to use locally cached data

Returns

Pandas Series containing requested data for channel.

Note

Data requests will be automatically chunked and combined into a single Pandas Series. However, you must be sure you request only a span of data that will properly fit in memory.

See get_data_iter for an iterator approach to timeseries data retrieval.

Example

Get 5 seconds of data from start over all channels:

data = channel.get_data(length='5s')

Get data betwen 12345 and 56789 (representing usecs since Epoch):

data = channel.get_data(start=12345, end=56789)
get_data_iter(start=None, end=None, length=None, chunk_size=None, use_cache=True)[source]

Returns iterator over the data. Must specify either ``end`` OR ``length``, not both.

Parameters
  • start (optional) – start time of data (default: earliest time available).

  • end (optional) – end time of data (default: latest time avialable).

  • length (optional) – some time length, e.g. ‘1s’, ‘5m’, ‘1h’ or number of usecs

  • chunk_size (optional) – some time length, e.g. ‘1s’, ‘5m’, ‘1h’ or number of usecs

  • use_cache (optional) – whether to use locally cached data

Returns

Iterator of Pandas Series, each the size of chunk_size.

segments(start=None, stop=None, gap_factor=2)[source]

Return list of contiguous segments of valid data for channel.

Parameters
  • start (long, datetime, optional) – Return segments starting after this time (default start of channel)

  • stop (long, datetime, optional) – Return segments starting before this time (default end of channel)

  • gap_factor (int, optional) – Gaps are computed by sampling_period * gap_factor (default 2)

Returns

List of tuples, where each tuple represents the (start, stop) of contiguous data.

update()[source]

Updates object on the platform (with any local changes) and syncs local instance with API response object.

Exmple:

pkg = bf.get('N:package:1234-1234-1234-1234')
pkg.name = "New name"
pkg.update()
end

The end time (in usecs) of channel data (microseconds since Epoch)

start

The start time of channel data (microseconds since Epoch)

class blackfynn.models.TimeSeriesAnnotation(label, channel_ids, start, end, name='', layer_id=None, time_series_id=None, description=None, **kwargs)[source]

Bases: blackfynn.models.BaseNode

Annotation is an event on one or more channels in a dataset

Parameters
  • label (str) – The label for the annotation

  • channel_ids – List of channel IDs that annotation applies

  • start – Start time

  • end – End time

  • name – Name of annotation

  • layer_id – Layer ID for annoation (all annotations exist on a layer)

  • time_series_id – TimeSeries package ID

  • description – Description of annotation

class blackfynn.models.TimeSeriesAnnotationLayer(name, time_series_id, description=None, **kwargs)[source]

Bases: blackfynn.models.BaseNode

Annotation layer containing one or more annotations. Layers are used to separate annotations into logically distinct groups when applied to the same data package.

Parameters
  • name – Name of the layer

  • time_series_id – The TimeSeries ID which the layer applies

  • description – Description of the layer

add_annotations(annotations)[source]

Add annotations to layer.

Parameters

annotations (str) – List of annotation objects to add.

annotation_counts(start, end, period, channels=None)[source]

The number of annotations between start and end over selected channels (all by default).

Parameters
  • start (datetime or microseconds) – The starting time of the range to query

  • end (datetime or microseconds) – The ending time of the the range to query

  • period (string) – The length of time to group the counts. Formatted as a string - e.g. ‘1s’, ‘5m’, ‘3h’

  • channels ([TimeSeriesChannel]) – List of channel (if omitted, all channels will be used)

annotations(start=None, end=None, channels=None)[source]

Get annotations between start and end over channels (all channels by default).

Parameters
  • start – Start time

  • end – End time

  • channels – List of channel objects or IDs

delete()[source]

Delete annotation layer.

insert_annotation(annotation, start=None, end=None, channel_ids=None, description=None)[source]

Add annotations; proxy for add_annotations.

Parameters
  • annotation (str) – Annotation string

  • start – Start time (usecs or datetime)

  • end – End time (usecs or datetime)

  • channel_ids – list of channel IDs

Returns

The created annotation object.

iter_annotations(window_size=10, channels=None)[source]

Iterate over annotations according to some window size (seconds).

Parameters
  • window_size (float) – Number of seconds in window

  • channels – List of channel objects or IDs

Yields

List of annotations found in current window.

Tabular

class blackfynn.models.Tabular(name, **kwargs)[source]

Bases: blackfynn.models.DataPackage

Represents a Tabular package on the platform.

Parameters

name – The name of the package

get_data(limit=1000, offset=0, order_by=None, order_direction='ASC')[source]

Get data from tabular package as DataFrame

Parameters
  • limit – Max number of rows to return (1000 default)

  • offset – Offset when retrieving rows

  • order_by – Column to order data

  • order_direction – Ascending (‘ASC’) or descending (‘DESC’)

Returns

Pandas DataFrame

get_data_iter(chunk_size=10000, offset=0, order_by=None, order_direction='ASC')[source]

Iterate over tabular data, each data chunk will be of size chunk_size.

Models and Relationships

Models

class blackfynn.models.Model(dataset_id, name, display_name=None, description=None, locked=False, *args, **kwargs)[source]

Representation of a Model in the knowledge graph.

add_properties(properties)

Appends multiple properties to the object’s schema and updates the object on the platform.

Parameters

properties (list) – List of properties to add

Note

At least one property on a model needs to serve as the model’s title. See title argument in example(s) below.

Example

Add properties using ModelProperty objects:

model.add_properties([
    ModelProperty('name', data_type=str, title=True),
    ModelProperty('age',  data_type=int)
])

Add properties defined as list of dictionaries:

model.add_properties([
        {
            'name': 'full_name',
            'type': str,
            'title': True
        },
        {
            'name': 'age',
            'type': int,
        }
])
add_property(name, data_type=<class 'str'>, display_name=None, title=False, description='')

Appends a property to the object’s schema and updates the object on the platform.

Parameters
  • name (str) – Name of the property

  • data_type (type, optional) – Python type of the property. Defaults to string_types.

  • display_name (str, optional) – Display name for the property.

  • title (bool, optional) – If True, the property will be used in the title on the platform

  • description (str, optional) – Description of the property

Example

Adding a new property with the default data_type::

mouse.add_property(‘name’)

Adding a new property with the float data_type::

mouse.add_property(‘weight’, float)

create_record(values={})[source]

Creates a record of the model on the platform.

Parameters

values (dict, optional) – values for properties defined in the Model schema

Returns

The newly created Record

Example:

mouse_002 = mouse.create_record({"id": 2, "weight": 2.2})
create_records(values_list)[source]

Creates multiple records of the model on the platform.

Parameters

values_list (list) – array of dictionaries corresponding to record values.

Returns

List of newly created Record objects.

Example:

mouse.create_records([
    { 'id': 311, 'weight': 1.9 },
    { 'id': 312, 'weight': 2.1 },
    { 'id': 313, 'weight': 1.8 },
    { 'id': 314, 'weight': 2.3 },
    { 'id': 315, 'weight': 2.0 }
])
delete()[source]

Deletes a model from the platform. Must not have any instances.

delete_records(*records)[source]

Deletes one or more records of a concept from the platform.

Parameters

*records – instances and/or ids of records to delete

Returns

None

Logs the list of records that failed to delete.

Example:

mouse.delete(mouse_002, 123456789, mouse_003.id)
get(id)[source]

Retrieves a record of the model by id from the platform.

Parameters

id – the Blackfynn id of the model

Returns

A single Record

Example:

mouse_001 = mouse.get(123456789)
get_all(limit=100, offset=0)[source]

Retrieves all records of the model from the platform.

Returns

List of Record

Example:

mice = mouse.get_all()
get_connected()[source]

Retrieves all connected models

Connected is defined as model that can be reached by following outgoing relationships starting at the current model

Parameters

id – The Blackfynn id of the “root” model

Returns

A list of models connected to the given model

Example:

connected_models = mouse.get_connected()
get_property(name)

Gets the property object by name.

Example

>>> mouse.get_propery('weight').type
float

Returns a list of related model types and counts of those relationships.

“Related” indicates that the model could be connected to the current model via some relationship, i.e. B is “related to” A if there exist A -[relationship]-> B. Note that the directionality matters. If B is the queried model, A would not appear in the list of “related” models.

Returns

List of Model objects related via a defined relationship

Example:

related_models = mouse.get_related()
query()[source]

Run a query with this model as the join target.

remove_property(property)

Remove property from model schema.

Parameters

property (string, ModelProperty) – Property to remove. Can be property name, id, or object.

update()[source]

Updates the details of the Model on the platform.

Example:

mouse.update()

Note

Currently, you can only append new properties to a Model.

exists

Whether or not the instance of this object exists on the platform.

class blackfynn.models.ModelProperty(name, display_name=None, data_type=<class 'str'>, id=None, locked=False, default=True, title=False, description='', required=False)[source]
class blackfynn.models.ModelPropertyType(data_type, format=None, unit=None)[source]
class blackfynn.models.ModelPropertyEnumType(data_type, format=None, unit=None, enum=None, multi_select=False)[source]
class blackfynn.models.Record(dataset_id, type, *args, **kwargs)[source]

Represents a record of a Model.

Includes its neighbors, relationships, and links.

delete()[source]

Deletes the instance from the platform.

Example:

mouse_001.delete()
get(name)
Returns

The value of the property if it exists. None otherwise.

get_files()[source]

All files related to the current record.

Returns

List of data objects i.e. DataPackage

Example::

mouse_001.get_files()

Returns all related records.

Parameters
  • model (str, Model, optional) – Return only related records of this type

  • group (bool, optional) – If true, group results by model type (dict)

Returns

List of Record objects. If group is True, then the result is a dictionary of RecordSet objects keyed by model names.

Example

Get all connected records of type disease with relationship has:

mouse_001.get_related('disease', 'has')

Get all connected records:

mouse_001.get_related()
relate_to(destinations, relationship_type='related_to', values=None, direction='to')[source]

Relate record to one or more Record or DataPackage objects.

Parameters
  • destinations (list of Record or DataPackage) – A list containing the Record or DataPackage objects to relate to current record

  • relationship_type (RelationshipType, str, optional) – Type of relationship to create

  • values (list of dictionaries, optional) – A list of dictionaries corresponding to relationship values

  • direction (str, optional) – Relationship direction. Valid values are 'to' and 'from'

Returns

List of created Relationship objects.

Note

Destinations must all be of type DataPackage or Record; you cannot mix destination types.

Example

Relate to a single Record, define relationship type:

mouse_001.relate_to(lab_009, 'located_at')

Relate to multiple DataPackage objects:

mouse_001.relate_to([eeg, mri1, mri2])
set(name, value)

Updates the value of an existing property or creates a new property if one with the given name does not exist.

Note

Updates the object on the platform.

update()[source]

Updates the values of the record on the platform (after modification).

Example:

mouse_001.set('name', 'Mickey')
mouse_001.update()
exists

Whether or not the instance of this object exists on the platform.

model

The Model of the current record.

Returns

A single Model.

class blackfynn.models.RecordSet(type, *args, **kwargs)[source]
append(object) → None -- append object to end
as_dataframe(record_id_column_name=None)[source]

Convert the list of Record objects to a pandas DataFrame

Parameters

record_id_column_name (string) – If set, a column with the desired name will be prepended to this dataframe that contains record ids.

Returns

pd.DataFrame

clear() → None -- remove all items from L
copy() → list -- a shallow copy of L
count(value) → integer -- return number of occurrences of value
extend(iterable) → None -- extend list by appending elements from the iterable
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

insert()

L.insert(index, object) – insert object before index

pop([index]) → item -- remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.

remove(value) → None -- remove first occurrence of value.

Raises ValueError if the value is not present.

reverse()

L.reverse() – reverse IN PLACE

sort(key=None, reverse=False) → None -- stable sort *IN PLACE*

Relationships

class blackfynn.models.RelationshipType(dataset_id, name, display_name=None, description=None, locked=False, *args, **kwargs)[source]

Bases: blackfynn.models.BaseModelNode

Model for defining a relationships.

add_properties(properties)[source]

Appends multiple properties to the object’s schema and updates the object on the platform.

Parameters

properties (list) – List of properties to add

Note

At least one property on a model needs to serve as the model’s title. See title argument in example(s) below.

Example

Add properties using ModelProperty objects:

model.add_properties([
    ModelProperty('name', data_type=str, title=True),
    ModelProperty('age',  data_type=int)
])

Add properties defined as list of dictionaries:

model.add_properties([
        {
            'name': 'full_name',
            'type': str,
            'title': True
        },
        {
            'name': 'age',
            'type': int,
        }
])
add_property(name, display_name=None, data_type=<class 'str'>)[source]

Appends a property to the object’s schema and updates the object on the platform.

Parameters
  • name (str) – Name of the property

  • data_type (type, optional) – Python type of the property. Defaults to string_types.

  • display_name (str, optional) – Display name for the property.

  • title (bool, optional) – If True, the property will be used in the title on the platform

  • description (str, optional) – Description of the property

Example

Adding a new property with the default data_type::

mouse.add_property(‘name’)

Adding a new property with the float data_type::

mouse.add_property(‘weight’, float)

create(items)[source]

Create multiple relationships between records using current relationship type.

Parameters

items (list) –

List of relationships to be created. Each relationship should be either a dictionary or tuple.

If relationships are dictionaries, they are required to have from/to or source/destination keys. There is an optional values key which can be used to attach metadata to the relationship; values should be a dictionary with key/value pairs.

If relationships are tuples, they must be in the form (source, dest).

Returns

Array of newly created Relationships objects

Example

Create multiple relationships (dictionary format):

diagnosed_with.create([
    { 'from': participant_001, 'to': parkinsons},
    { 'from': participant_321, 'to': als}
])

Create multiple relationships (tuple format):

diagnosed_with.create([
    (participant_001, parkinsons),
    (participant_321, als)
])
get(id)[source]

Retrieves a relationship by id from the platform.

Parameters

id (int) – the id of the instance

Returns

A single Relationship

Example:

mouse_001 = mouse.get(123456789)
get_all()[source]

Retrieves all relationships of this type from the platform.

Returns

List of Relationship

Example:

belongs_to_relationships = belongs_to.get_all()
relate(source, destination, values={})[source]

Relates a Record to another Record or DataPackage using current relationship.

Parameters
  • source (Record, DataPackage) – record or data package the relationship orginates from

  • destination (Record, DataPackage) – record or data package the relationship points to

  • values (dict, optional) – values for properties defined in the relationship’s schema

Returns

The newly created Relationship

Example

Create a relationship between a Record and a DataPackage:

from_relationship.relate(mouse_001, eeg)

Create a relationship (with values) between a Record and a DataPackage:

from_relationship.relate(mouse_001, eeg, {"date": datetime.datetime(1991, 02, 26, 07, 0)})
class blackfynn.models.Relationship(dataset_id, type, source, destination, *args, **kwargs)[source]

A single instance of a RelationshipType.

delete()[source]

Deletes the instance from the platform.

Example:

mouse_001_eeg_link.delete()
get(name)
Returns

The value of the property if it exists. None otherwise.

relationship()[source]

Retrieves the relationship definition of this instance from the platform

Returns

A single RelationshipType.

set(name, value)[source]

Updates the value of an existing property or creates a new property if one with the given name does not exist.

Note

Updates the object on the platform.

exists

Whether or not the instance of this object exists on the platform.

class blackfynn.models.RelationshipSet(type, *args, **kwargs)[source]

Bases: blackfynn.models.BaseInstanceList

as_dataframe()[source]

Converts the list of Relationship objects to a pandas DataFrame

Returns

pd.DataFrame

Note

In addition to the values in each relationship instance, the DataFrame contains three columns that describe each instance:

  • __source__: ID of the instance’s source

  • __destination__: ID of the instance’s destination

  • __type__: Type of relationship that the instance is