Skip to content Skip to sidebar Skip to footer

How To Select Only Certain Substrings

from a string say dna = 'ATAGGGATAGGGAGAGAGCGATCGAGCTAG' i got substring say dna.format = 'ATAGGGATAG','GGGAGAGAG' i only want to print substring whose length is divisible by 3 how

Solution 1:

For including overlap substrings, I have the following lengthy version. The idea is to find all starting and ending marks and calculate the distance between them.

mydna = 'ATAGGGATAGGGAGAGAGCAGATCGAGCTAG'
[mydna[start.start():end.start()+3] for start in re.finditer('(?=ATA)',mydna) forendin re.finditer('(?=AGA)',mydna) ifend.start()>start.start() and (end.start()-start.start())%3 == 0]
['ATAGGGATAGGG', 'ATAGGG']

Show all substrings, including overlapping ones:

[mydna[start.start():end.start()+3] for start in re.finditer('(?=ATA)',mydna) forendin re.finditer('(?=AGA)',mydna) ifend.start()>start.start()]
['ATAGGGATAGGG', 'ATAGGGATAGGGAG', 'ATAGGGATAGGGAGAGAGC', 'ATAGGG', 'ATAGGGAG', 'ATAGGGAGAGAGC']

Solution 2:

You can also use the regular expression for that:

re.findall('ATA((...)*?)AGA', mydna)

the inner braces match 3 letters at once.

Solution 3:

Using modulo is the correct procedure. If it's not working, you're doing it wrong. Please provide an example of your code in order to debug it.

Solution 4:

re.findAll() will return you an array of matching strings, You need to iterate on each of those and do a modulo on those strings to achieve what you want.

Post a Comment for "How To Select Only Certain Substrings"