Getting VEnCodes

Here listed is the documentation to the “Vencode” object, the main object in this module.

Pay special attention to the type of data needed to feed this object, its arguments and main methods.

Objects for generating and ranking VEnCodes

internals.py: Classes module for the VEnCode project

class VEnCode.internals.Vencodes(data_object, algorithm='heuristic', number_of_re=4, n_samples=10000, stop=5, second_data_object=None, using=None, target=None)

An Object representing the VEnCodes found for a specific celltype. VEnCodes are combinations of regulatory elements that are active specifically in one celltype and inactive in all others. This class contains methods to search, retrieve, classify and visualize VEnCodes from a matrix of regulatory element (rows) expression levels per celltype (columns).

vencodes

List of coordinates for the regulatory elements that constitute the VEnCodes found. There are other ways of retrieving VEnCode information, see Methods. To generate a list of VEnCodes, use the next() method.

Type

list of str

e_values

E value score for the VEnCodes found. Must be first determined using the determine_e_values() method.

Type

list of int

data

The original data set used to find VEnCodes.

Type

DataTpm

algorithm

Algorithm used to find VEnCodes.

Type

str

k

The VEnCode size. In other words, the number of regulatory elements that form each VEnCode.

Type

int

target_replicates

The target samples names to retrieve VEnCodes from.

Type

list of str, str

target_replicates_data

A shortcut to the subset of the data corresponding to the target samples.

Type

pd.DataFrame

Parameters
  • data_object (DataTpm, pd.DataFrame) – Data to use in finding VEnCodes. This should be a matrix of regulatory elements (rows) expression levels per celltype (columns). The matrix can be supplied as a DataTpm object, which has methods to quickly prepare the data, or as a pandas DataFrame object.

  • algorithm ({'heuristic', 'sampling'}, optional) – Algorithm to find VEnCodes.

  • number_of_re (int) – VEnCode size. In other words, the number of regulatory elements that should form each VEnCode.

  • n_samples (int) – Number of random samples to take to try to find a VEnCode. Used only if algorithm="sampling"

  • stop (int) – Number of promoters to test per node level. Used only if algorithm="heuristic"

  • second_data_object (DataTpm, None) – If the current VEnCode object contains as source a set of promoter expression, supplying an enhancer DataTpm object here will allow retrieval of hybrid enhancer-promoter VEnCodes.

  • using (str, list, None) – Allows the user to force some REs to be in the VEnCode, if possible.

  • target (str, None) – When supplying the VEnCode object with a DataFrame, the target celltype must be specified here.

next(amount=1)

Call this function to generate the next VEnCode. The VEnCode is appended to the variable vencodes and can also be returned as a variable.

determine_e_values(repetitions=100)

Call this function to generate e-values for the current VEnCodes. E-values will be stored in the variable called e_values. Method applied to calculate e-values is a Monte-Carlo simulation.

export(*args, path=None, verbose=True):

Call this method to export vencode related values to CSV files. Put “vencodes” in the arguments to export each VEnCode to a CSV file, “e-values” to export the e-values, and “TPP” to export the tags per million expression of the REs that comprise the VEnCodes for the target celltype. You can put any amount of these arguments in the same function as long as they are supported. Use path to define a specific directory to store the file. (must be a complete path)

determine_e_values(repetitions=100)

Call this function to generate e values for the current VEnCodes. E values will be stored in the variable called e_values. The method applied to calculate e values is a Monte-Carlo simulation.

Parameters

repetitions (int) – Number of times each vencode is evaluated to get the average value.

export(*args, path=None, verbose=True)

Call this method to export vencode related values to CSV files. Put “vencodes” in the arguments to export each VEnCode to a CSV file, “e-values” to export the e-values, and “TPP” to export the tags per million expression of the REs that comprise the VEnCodes for the target celltype. You can put any amount of these arguments in the same function as long as they are supported. Use path to define a specific directory to store the file. (must be a complete path)

Parameters
  • args – “e-values”, “vencodes”, “TPP” or even all at once.

  • path (str, None) – Path to write a file to store the VEnCode data.

  • verbose (bool) – Either to allow the function to print messages to console (True), or not (False).

get_vencode_data(method='return')

Call this function to get the VEnCode data as a variable (method="return"), or printed in terminal (method="print").

Parameters

method (str) – How to retrieve the data.

next(amount=1)

Call this function to generate the next VEnCode. The VEnCode is appended to the variable vencodes and can also be returned as a variable.

Parameters

amount (int) – Number of vencodes to retrieve.

Returns

A list containing the desired amount of vencodes.

Return type

list

next_heuristic2_vencode(second_data_object, amount=1)

Call this function to generate the next VEnCode, possibly hybrid enhancer-promoter VEnCode. The VEnCode is appended to the variable self.vencodes.

Parameters
  • second_data_object (DataTpm) – If the current VEnCode object contains as source a set of promoter expression, supplying an enhancer DataTpm object here will allow retrieval of hybrid enhancer-promoter VEnCodes.

  • amount (int) – Number of vencodes to retrieve.

static vencode_mc_simulation(data, reps=100)

Simulates turning 0s to 1s over a data set, asking each turn if the data still represents a VEnCode. Tests the VEnCode robustness to false negatives in the data.

Parameters
  • data (pd.DataFrame) – Data frame of promoter expression per celltype without the celltype of interest.

  • reps (int) – Number of simulations to run.

Returns

The e value, that is, the average number of random changes done to the data until it breaks the VEnCode.

Return type

int

view_vencodes(method='print', interpolation='nearest', path=None, snapshot=None, verbose=True)

Call this function to get an heat map visualization of the vencodes.

Parameters
  • method (str) – Method to view VEnCodes. “print” to get visualization on terminal. “write” to write to a file. “both” for both.

  • interpolation (str) – Method for heat map interpolation.

  • path (str, None) – Optional path for the file.

  • snapshot (int, None) – Number of celltypes to show in heat map. False gets all but may hinder visualization.

  • verbose (bool) – Either to allow the function to print messages to console (True), or not (False).