bspysmg.data package#
Module contents#
Package containing the files for sampling, postprocessing and posterior loading of data.
Submodules#
bspysmg.data.dataset module#
File containing a class for loading sampling data as a dataset, as well as a function for loading the dataset into a PyTorch dataloader.
- class bspysmg.data.dataset.ModelDataset(filename: str, steps: int = 1)[source]#
Bases:
Dataset- load_data_from_npz(filename: str, steps: int) Tuple[array, array, dict][source]#
Loads the inputs, targets and sampling configurations from a given postprocessed_data.npz file.
- Parameters
filename (str) – Folder and filename where the posprocessed_data.npz is.
steps (int) – It allows to skip parts of the data when loading it into memory. The number indicates how many items will be skipped in between. By default, step number is one (no values are skipped). E.g., if steps = 2, and the inputs are [0, 1, 2, 3, 4, 5, 6]. The only inputs taken into account would be: [0, 2, 4, 6].
- Returns
inputs (np.array) – Input waves sent to the activation electrodes of the device during sampling.
outputs (np.array) – Raw output data from the readout electrodes of the device during sampling, corresponding to the input.
sampling_configs (dict) – Dictionary containing the sampling configurations with which the data was acquired.
Notes
The postprocessed data is a .npz file called postprocessed_data.npz with keys: inputs, outputs and info (dict)
1. inputs: np.array The input(s) is(are) gathered for all activation electrodes. The units is in Volts.
2. outputs: The output(s) is(are) gathered from all the readout electrodes. The units are in nA. The output data is raw. Additional amplification correction might be needed, this is left for the user to decide.
3. info: dict Data structure of output and input are arrays of NxD, where N is the number of samples and D is the dimension.
The configs dictionary contains a copy of the configurations used for sampling the data. In addition, the configs dictionary has a key named electrode_info, which is created during the postprocessing step. The electrode_info key contains the following keys: 3.1 electrode_no: int Total number of electrodes in the device
3.2 activation_electrodes: dict
3.2.1 electrode_no: int Number of activation electrodes used for gathering the data
3.2.2 voltage_ranges: list Voltage ranges used for gathering the data. It contains the ranges per electrode, where the shape is (electrode_no,2). Being 2 the minimum and maximum of the ranges, respectively.
3.3 output_electrodes: dict
3.3.1 electrode_no : int Number of output electrodes used for gathering the data
3.3.2 clipping_value: list[float,float] Value used to apply a clipping to the sampling data within the specified values.
3.3.3 amplification: float Amplification correction factor used in the device to correct the amplification applied to the output current in order to convert it into voltage before its readout.
- bspysmg.data.dataset.get_dataloaders(configs: dict) Tuple[List[DataLoader], float, dict][source]#
Loads all the datasets specified in the dataset_paths list key of the configurations dictionary and creates a dataloader.
- Parameters
configs (dict) –
Surrogate model generation configurations.
1. results_base_dir: str Directory where the trained model and corresponding performance plots will be stored.
2. seed: int Sets the seed for generating random numbers to a non-deterministic random number.
3. hyperparameters: epochs: int learning_rate: float
4. model_structure: dict The definition of the internal structure of the surrogate model, which is typically five fully-connected layers of 90 nodes each.
4.1 hidden_sizes : list A list containing the number of nodes of each layer of the surrogate model. E.g., [90,90,90,90,90]
4.2 D_in: int Number of input features of the surrogate model structure. It should correspond to the activation electrode number.
4.3 D_out: int Number of output features of the surrogate model structure. It should correspond to the readout electrode number.
5. data: 5.1 dataset_paths: list[str] A list of paths to the Training, Validation and Test datasets, stored as postprocessed_data.npz. It also supports adding a single training dataset, and splitting it using the configuration split_percentages.
5.2 split_percentages: list[float] (Optional) When provided together a single dataset path, in the dataset_paths list, this variable allows to split it into training, validation and test datasets by providing the split percentage values. E.g. [0.8, 0.2] will split the training dataset into 80% of the data for training and 20% of the data for validation. Similarly, [0.8, 0.1, 0.1] will split the training dataset into 80%, 10% for validation dataset and 10% for test dataset. Note that all split values in the list should add to 1.
5.3 steps : int It allows to skip parts of the data when loading it into memory. The number indicates how many items will be skipped in between. By default, step number is one (no values are skipped). E.g., if steps = 2, and the inputs are [0, 1, 2, 3, 4, 5, 6]. The only inputs taken into account would be: [0, 2, 4, 6].
5.4 batch_size: int How many samples will contain each forward pass.
5.5 worker_no: int How many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
5.6 pin_memory: boolean If True, the data loader will copy Tensors into CUDA pinned memory before returning them. If your data elements are a custom type, or your collate_fn returns a batch that is a custom type.
- Returns
dataloaders – A list containing the corresponding training, validation and test datasets.
- Return type
list[torch.utils.dataDataLoader]
- bspysmg.data.dataset.get_info_dict(training_configs: dict, sampling_configs: dict) dict[source]#
Retrieve the info dictionary given the training configs and the sampling configs. Note that the electrode_info key should be present in the sampling configs. This key is automatically generated when postprocessing the data.
- Parameters
training_configs (dict) – A copy of the configurations used for training the surrogate model.
sampling_configs (dict) – A copy of the configurations used for sampling the training data.
- Returns
This dictionary is required in order to initialise a surrogate model. It contains the following keys: 1. model_structure: dict The definition of the internal structure of the surrogate model, which is typically five fully-connected layers of 90 nodes each.
1.1 hidden_sizes : list A list containing the number of nodes of each layer of the surrogate model. E.g., [90,90,90,90,90]
1.2 D_in: int Number of input features of the surrogate model structure. It should correspond to the activation electrode number.
1.3 D_out: int Number of output features of the surrogate model structure. It should correspond to the readout electrode number.
2. electrode_info: dict It contains all the information required for the surrogate model about the electrodes.
2.1 electrode_no: int Total number of electrodes in the device
2.2 activation_electrodes: dict
2.2.1 electrode_no: int Number of activation electrodes used for gathering the data
2.2.2 voltage_ranges: list Voltage ranges used for gathering the data. It contains the ranges per electrode, where the shape is (electrode_no,2). Being 2 the minimum and maximum of the ranges, respectively.
2.3 output_electrodes: dict
2.3.1 electrode_no : int Number of output electrodes used for gathering the data
2.3.2 clipping_value: list[float,float] Value used to apply a clipping to the sampling data within the specified values.
2.3.3 amplification: float Amplification correction factor used in the device to correct the amplification applied to the output current in order to convert it into voltage before its readout.
3. training_configs: dict A copy of the configurations used for training the surrogate model.
4. sampling_configs : dict A copy of the configurations used for gathering the training data.
- Return type
info_dict
bspysmg.data.postprocess module#
File containing functions for postprocessing raw data gathered from the sampler and information for the model’s info dictionary.
- bspysmg.data.postprocess.clip_data(inputs: array, outputs: array, clipping_value_range: list) Tuple[array, array][source]#
Removes all the outputs and corresponding inputs where the output is outside a given maximum and minimum range.
- Parameters
inputs (np.array) – Array containing all the inputs that were sent to the device during sampling.
outputs (np.array) – Array containing all the outputs of the device obtained during sampling, which correspond to the inputs to the device.
clipping_value_range (list[float,float]) – A list of length two. The first element will be the lower clipping range, and the second element will be the higher clipping range.
- Returns
inputs (np.array) – Array containing all the inputs that were sent to the device during sampling, except for those values for which its corresponding output is above and below the specified clipping range.
outputs (np.array) – Array containing all the outputs of the device obtained during sampling, except for those values for which its corresponding output is above and below the specified clipping range.
- bspysmg.data.postprocess.get_electrode_info(configs: dict, clipping_value) dict[source]#
Retrieve electrode information from the data sampling configurations.
- Parameters
configs (dict) –
Sampling configurations with the following keys: 1. driver: dict Dictionary containing the driver configurations. For more information check the documentation about this configuration file, check the documentation of brainspy.processors.hardware.drivers.ni.setup.NationalInstrumentsSetup
2. input_data : dict Dictionary containing the information necessary to create the input sampling data. 2.1 activation_electrode_no: int Number of activation electrodes in the device that wants to be sampled.
2.2 readout_electrode_no : int Number of readout electrodes in the device that wants to be sampled.
2.3 amplitude : [list[float]] Amplitude of the generated input wave signal. It is calculated according to the minimum and maximum ranges of each electrode. Where the amplitude value should correspond with (max_range_value - min_range_value) / 2. If no amplitude is given it will be automatically calculated from the driver configurations for activation electrode ranges. If it wants to be manually set, the offset variable should also be included in the dictionary.
2.4 offset: [list[float]] Vertical offset of the generated input wave signal. It is calculated according to the minimum and maximum ranges of each electrode. Where the offset value should correspond with (max_range_value + min_range_value) / 2. If no offset is given it will be automatically calculated from the driver configurations for activation electrode ranges. If it wants to be manually set, the offset variable should also be included in the dictionary.
clipping_value (str or list) – The value that will be used to clip the sampling data within a specific range. if default is passed, a default clipping value will be used.
- Returns
electrode_info – Configuration dictionary containing all the keys related to the electrode information: 1. electrode_no: int Total number of electrodes in the device
2. activation_electrodes: dict 2.1 electrode_no: int Number of activation electrodes used for gathering the data
2.2 voltage_ranges: list Voltage ranges used for gathering the data. It contains the ranges per electrode, where the shape is (electrode_no,2). Being 2 the minimum and maximum of the ranges, respectively.
3. output_electrodes: dict 3.1 electrode_no : int Number of output electrodes used for gathering the data
3.2 clipping_value: list[float,float] Value used to apply a clipping to the sampling data within the specified values.
3.3 amplification: float Amplification correction factor used in the device to correct the amplification applied to the output current in order to convert it into voltage before its readout.
- Return type
dict
- bspysmg.data.postprocess.get_sampling_data(filename: str, activation_electrode_no: int, readout_electrode_no: int) Tuple[array, array][source]#
Reads the sampling data from a text file (IO.dat) and returs the values loaded in numpy arrays.
- Parameters
filename (str) – Path to the file containing comma separated values read during the data gathering process. Typically, named IO.dat.
activation_electrode_no (int) – Number of activation electrodes used for the device during the data gathering process.
readout_electrode_no (int) – Number of current readout/output electrodes used for the device during the data gathering process.
- Returns
inputs (np.array) – Array containing all the inputs that were sent to the device during sampling.
outputs (np.array) – Array containing all the outputs of the device obtained during sampling, which correspond to the inputs to the device.
- bspysmg.data.postprocess.get_voltage_ranges(offset: list, amplitude: list) array[source]#
Calculate the voltage ranges of the device out of the information about the amplitude and the vertical offset that was used to compute the input waves during the data gathering process.
- Parameters
offset (list) – A list of all the offset values to vertically displace the input signal in such a way that it fits the activation electrode ranges. The list would contain one value per activation electrode.
amplitude (list) – A list of all the amplitude values to amplify the input signal in such a way that it fits the activation electrode ranges.
- Returns
Array containing the ranges per electrode, where the shape is (electrode_no,2). Being 2 the minimum and maximum of the ranges, respectively.
- Return type
np.array
- bspysmg.data.postprocess.post_process(data_dir: str, clipping_value='default', charging_signal_batch_no: int = 40, reference_signal_batch_no: int = 15, filename: str = 'postprocessed_data', **kwargs) Tuple[array, array, dict][source]#
Postprocesses the data, cleans any clipping (optional), and merges data sets if needed. The data arrays are merged into a single array and cropped given the clipping_values. The function also plots and saves the histogram of the data.
- Parameters
data_dir (str) – A string with path to the directory with the data: it is assumed at least two files exist, named sampler_configs.json and a IO.dat respectively.
clipping_value ([float,float]) –
Will apply a clipping to the input and output sampling data within the specified values. The the setups have a limit in the range they can read. They typically clip at approximately +-4 V. Note that in order to calculate the clipping_range, it needs to be multiplied by the amplification value of the setup. (e.g., in the Brains setup the amplification is 28.5, is the clipping_value is +-4 (V), therefore, the clipping value should be +-4 * 28.5, which is [-110,110] (nA) ). This variable represents a lower and upper clipping_value to crop data. It can be either None, ‘default’ or [float,float]. The ‘default’ str input will automatically take the clipping value by multiplying the amplification of the data by -4 and 4. The None input will not apply any clipping.
N O T E: When the clipping value is set to None, the model will accurately represent the hardware setup (feedback resistance of the operational amplifier). When clipping value set to the values that are clipping, the model will extrapolate the results outside of the clipping range caused by the hardaware setup.
charging_signal_batch_no ([int]) – Number of batches that will be used for extracting the charging signal.
reference_signal_batch_no ([int]) – Number of batches that will be used for extracting the reference signal.
filename ([str]) – The name of the file that will be produced after postprocessing. By default: postprocessed_data.npz
kwargs (Optional kwargs are as follows:) – 1. list_data: A list of strings indicating directories with postprocessed_data.npz containing input and output data relationships from the device, as well as the configuration with which the data was acquired.
Examples
>>> inputs, outputs, configs = post_process('tmp/data/training/TEST/17-02-2021/')
Notes
The postprocessed data is a .npz file called postprocessed_data.npz with keys: inputs, outputs and info (dict)
1. inputs: np.array The input(s) is(are) gathered for all activation electrodes. The units is in Volts.
2. outputs: The output(s) is(are) gathered from all the readout electrodes. The units are in nA. The output data is raw. Additional amplification correction might be needed, this is left for the user to decide.
3. info: dict Data structure of output and input are arrays of NxD, where N is the number of samples and D is the dimension.
The configs dictionary contains a copy of the configurations used for sampling the data. In addition, the configs dictionary has a key named electrode_info, which is created during the postprocessing step. The electrode_info key contains the following keys: 3.1 electrode_no: int Total number of electrodes in the device
3.2 activation_electrodes: dict
3.2.1 electrode_no: int Number of activation electrodes used for gathering the data
3.2.2 voltage_ranges: list Voltage ranges used for gathering the data. It contains the ranges per electrode, where the shape is (electrode_no,2). Being 2 the minimum and maximum of the ranges, respectively.
3.3 output_electrodes: dict
3.3.1 electrode_no : int Number of output electrodes used for gathering the data
3.3.2 clipping_value: list[float,float] Value used to apply a clipping to the sampling data within the specified values.
3.3.3 amplification: float Amplification correction factor used in the device to correct the amplification applied to the output current in order to convert it into voltage before its readout.
- bspysmg.data.postprocess.print_electrode_info(configs: dict) None[source]#
Prints on screen the information about the electrodes that was gathered from the configuration file used for gathering the data from the device.
- Parameters
configs (dict) –
Configuration dictionary containing all the keys related to the electrode information: 1. electrode_no: int Total number of electrodes in the device
2. activation_electrodes: dict 2.1 electrode_no: int Number of activation electrodes used for gathering the data
2.2 voltage_ranges: list Voltage ranges used for gathering the data. It contains the ranges per electrode, where the shape is (electrode_no,2). Being 2 the minimum and maximum of the ranges, respectively. 3. output_electrodes: dict 3.1 electrode_no : int Number of output electrodes used for gathering the data
3.2 clipping_value: list[float,float] Value used to apply a clipping to the sampling data within the specified values.
3.3 amplification: float Amplification correction factor used in the device to correct the amplification applied to the output current in order to convert it into voltage before its readout.
- bspysmg.data.postprocess.save_npz(data_dir: str, file_name: str, inputs: array, outputs: array, configs: dict) None[source]#
Stores the input, outputs and sampling configurations in an .npz file. The saved file needs to be opened with the option pickle=True, since it contains a dictionary.
- Parameters
data_dir (str) – Folder where the data is going to be stored.
file_name ([type]) – The name of the data that wants to be stored.
inputs (np.array) – Array containing all the inputs that were sent to the device during sampling.
outputs (np.array) – Array containing all the outputs of the device obtained during sampling, which correspond to the inputs to the device.
configs (dict) –
Sampling configurations with the following keys:
1. save_directory: str Directory where the all the sampling data will be stored.
2. data_name: str Inside the path specified on the variable save_directory, a folder will be created, with the format: <data_name>+<current_timestamp>. This variable specified the prefix of that folder before the timestamp.
3. driver: dict Dictionary containing the driver configurations. For more information check the documentation about this configuration file, check the documentation of brainspy.processors.hardware.drivers.ni.setup.NationalInstrumentsSetup
4. input_data : dict Dictionary containing the information necessary to create the input sampling data. 4.1 input_distribution: str It determines the wave shape of the input. Two main options availeble ‘sawtooth’ and ‘sine’. The first option will create saw-like signals, and the second sine-wave signals. Sawtooth signals have more coverage on the edges of the input range.
4.2 activation_electrode_no: int Number of activation electrodes in the device that wants to be sampled.
4.3 readout_electrode_no : int Number of readout electrodes in the device that wants to be sampled.
4.4 input_frequency: list Base frequencies of the input waves that will be created. In order to optimise coverage, irrational numbers are recommended. The list should have the same length as the activation electrode number. E.g., for 7 activation electrodes: input_frequency = [2, 3, 5, 7, 13, 17, 19]
4.5 phase : float Horizontal shift of the input signals. It is recommended to have random numbers which are different for the training, validation and test datasets. These numbers will be square rooted and multiplied by a given factor.
4.6 factor : float Given factor by which the input frequencies will be multiplied after square rooting them.
4.7 amplitude : Optional[list[float]] Amplitude of the generated input wave signal. It is calculated according to the minimum and maximum ranges of each electrode. Where the amplitude value should correspond with (max_range_value - min_range_value) / 2. If no amplitude is given it will be automatically calculated from the driver configurations for activation electrode ranges. If it wants to be manually set, the offset variable should also be included in the dictionary.
4.8 offset: Optional[list[float]] Vertical offset of the generated input wave signal. It is calculated according to the minimum and maximum ranges of each electrode. Where the offset value should correspond with (max_range_value + min_range_value) / 2. If no offset is given it will be automatically calculated from the driver configurations for activation electrode ranges. If it wants to be manually set, the offset variable should also be included in the dictionary.
4.9 ramp_time: float Time that will be taken before sending each batch to go from zero to the first point of the batch and to zero from the last point of the batch.
4.10 batch_time: Time that the sampling of each batch will take.
4.11 number_batches: int Number of batches that will be sampled. A default value of 3880 is reccommended.
bspysmg.data.sampling module#
File containing a class for sampling a device.
- class bspysmg.data.sampling.Sampler(configs: dict)[source]#
Bases:
object- config_offset_and_amplitude() None[source]#
It extracts the offset and amplitude values that the input waveforms will have, according to the voltage ranges specified in the driver. It stores them into the configuration dictionary (input_data/amplitude and input_data/offset).
- get_batch_indices(sample_no: int, batch_size: int) Generator[int, None, None][source]#
Collects data length into indices and yields them into fixed-length chunks or blocks.
- Parameters
sample_no (int) – Total number of samples to be sent to the device.
batch_size (int) – Desired block size in which the total number of sizes will be divided.
- Yields
indices (list[int]) – List of indices corresponding to the generated input signal.
- get_header(input_no: int, output_no: int) str[source]#
Gets the header of the txt file data, so that is stored as a string format.
- Parameters
input_no (int) – The input electrode number.
output_no (int) – The output electrode number.
- Returns
The headers of each input and output batch in a string format.
- Return type
str
- init_configs() Tuple[int, int, dict][source]#
Initializes the configurations for performing sampling operation. It initializes and returns the number of samples, the batch size and an input dictionary for sampling.
- Returns
- total_number_samples: int
The total number of samples to be generated in the input dataset.
- batch_size: int
The batch size to be used for processing the input dataset in batches.
- input_dict: dict
- The configurations for sampling operation with following keys:
- activation_electrode_no: int
Number of activation electrodes in the device that wants to be sampled.
- readout_electrode_noint
Number of readout electrodes in the device that wants to be sampled.
- input_frequency: list
Base frequencies of the input waves that will be created. In order to optimise coverage, irrational numbers are recommended. The list should have the same length as the activation electrode number. E.g., for 7 activation electrodes: input_frequency = [2, 3, 5, 7, 13, 17, 19]
- phasefloat
Horizontal shift of the input signals. It is recommended to have random numbers which are different for the training, validation and test datasets. These numbers will be square rooted and multiplied by a given factor.
- amplitudeOptional[list[float]]
Amplitude of the generated input wave signal. It is calculated according to the minimum and maximum ranges of each electrode. Where the amplitude value should correspond with (max_range_value - min_range_value) / 2. If no amplitude is given it will be automatically calculated from the driver configurations for activation electrode ranges. If it wants to be manually set, the offset variable should also be included in the dictionary.
- offset: Optional[list[float]]
Vertical offset of the generated input wave signal. It is calculated according to the minimum and maximum ranges of each electrode. Where the offset value should correspond with (max_range_value + min_range_value) / 2. If no offset is given it will be automatically calculated from the driver configurations for activation electrode ranges. If it wants to be manually set, the offset variable should also be included in the dictionary.
- number_batches: int
Number of batches that will be sampled. A default value of 3880 is reccommended.
- Return type
tuple
- ramp_input(x: array)[source]#
The input batch is prepared for sampling on the device by ramping it from zero until the beginning of the batch until the first point, and a ramping from the last point to zero.
- Parameters
x (np.array) – Input batch that will be sent to the device in order to sample its corresponding output. The dimension of the sample should be (activation_electrode_no, batch_size).
- Returns
ramped_input – Input batch that will be sent to the device in order to sample its corresponding output with an additional ramping from zero until the beginning of the batch until the first point, and a ramping from the last point to zero.
- Return type
np.array
- sample(plot_interval: int = 1) str[source]#
Performs a full sampling operation, divided into several batches, according to the configurations given to the class. It stores each batch on a txt file. Additionally it can also show the inputs and corresponding outputs of a batch in a saved plot.
- Parameters
plot_interval (Optional[int]) – It sets after how many batches it will save a plot of the current batch, that shows the inputs and outputs. By default 1 (on every batch).
- Returns
save_directory – Folder where all the sampling data is stored.
- Return type
str
- sample_batch(x: array) array[source]#
Perform a sampling operation over one input batch. This method includes a ramping from zero until the beginning of the batch until the first point, and a ramping from the last point to zero. The information from ramps is filtered before it is returned.
- Parameters
x (np.array) – Input batch that will be sent to the device in order to sample its corresponding output. The dimension of the sample should be (activation_electrode_no, batch_size).
- Returns
output – Readout of the device when applying the input batch. The size of the output will be (batch_size, readout_electrode_no).
- Return type
np.array
- bspysmg.data.sampling.convert(num_batches, total_batches, time_taken)[source]#
Converts the number of batches and the last time taken for measuring them into a string containing an estimation of the time left in HH:MM:SS format.
- Parameters
num_batches (int) – Number of batches that have been already sampled.
total_batches (int) – Total batches that will be sampled.
time_taken (float) – Time that the last measurement has taken.
- Returns
str
- Return type
A string containing the time left in HH:MM:SS