TEST module

Genotyping module

constrain.test.genotyping.concatenating_list_of_dfs(list_of_dfs: list)[source]: Concatenating a list of daframes into one pd.dataframe by rows

constrain.test.genotyping.pairwise_alignment_of_templates(reads: list, templates: list, primers: list) → dict[source]

Infers relationship of templates to reads based on highest score from a pairwise alignment.

Parameters:

reads (list of Bio.SeqRecord.SeqRecord) – these are .ab1 files made into Bio.SeqRecord.SeqRecord objects
templates (list of Bio.SeqRecord.SeqRecord) – Templates for inferring relationship with - could be plasmid fx
primers (list of Bio.SeqRecord.SeqRecord) – list of primers to be for finding were the read should start

Return type:

pd.Dataframe in the following way

Example

<<<df_alignment = pairwise_alignment_of_templates(reads,templates, primers_for_seq)

<<< df_alignment

Sample-Name inf_promoter_name align_score inf_promoter 132 yp53re_cpr_A10_A10-pad_cpr_fw pCCW12 634.0 5 188 yp53re_cpr_A11_A11-pad_cpr_fw pTPI1 904.0 6 247 yp53re_cpr_A12_A12-pad_cpr_fw pTPI1 851.0 6 93 yp53re_cpr_A1_A01-pad_cpr_fw pCCW12 543.0 5 41 yp53re_cpr_A2_A02-pad_cpr_fw pCCW12 636.0 5

Notes

If you want inf_part_number column then change your the description of the Bio.SeqRecord.SeqRecord as follows:

pCCW12.description = ‘1’

constrain.test.genotyping.plat_seq_data_wrangler(sequencing_plates: list) → list[source]

Makes list of Plate2Seq pd.DataFrames into numeric values and removes nan values.

Parameters:: sequencing_plates (list of pd.DataFrames) – Sliced Plate2seq pd.dataframes
Return type:: Plate2Seq pd.DataFrames with numeric values

constrain.test.genotyping.plate_AvgQual(list_of_dfs_numeric: list, Avg_qual=50, used_bases=25) → list[source]

Filters out rows that doesnt follow the criteria.

Parameters:

list_of_dfs_numeric (list of pd.DataFrames) – Sliced and Plate2seq pd.dataframes
Avg_qual (int) –
used_bases (int) –

Return type:

Plate2Seq pd.DataFrames with that follows Avg_qual and used_bases criteria

constrain.test.genotyping.slicing_and_naming_seq_plates(sequencing_plates, where_to_slice=7) → list[source]

Slices rows of a list of dataframes and changes the names. Is used to ease pre-processing of Plate2seq excel files

Parameters:

sequencing_plates (list of pd.DataFrames) – Plate2seq pd.dataframes
where_to_slice (int) – indicate where to slice the dataframe

Return type:

list of plates sliced pd.DataFrames

constrain.test.genotyping.split_df_names(df_names_column, which_column_to_split1=0, which_column_to_split2=2) → list[source]: Splits sample names from plate2seq pd.dataframes into plate and well columns