DNA Sorting

Bertha 。 2022-05-25 13:50 232阅读 0赞

Description

One measure of ``unsortedness’’ in a sequence is the number of pairs of entries that are out of order with respect to each other. For instance, in the letter sequence ``DAABEC’’, this measure is 5, since D is greater than four letters to its right and E is greater than one letter to its right. This measure is called the number of inversions in the sequence. The sequence ``AACEDGG’’ has only one inversion (E and D)—-it is nearly sorted—-while the sequence ``ZWQM’’ has 6 inversions (it is as unsorted as can be—-exactly the reverse of sorted).

You are responsible for cataloguing a sequence of DNA strings (sequences containing only the four letters A, C, G, and T). However, you want to catalog them, not in alphabetical order, but rather in order of ``sortedness’’, from ``most sorted’’ to ``least sorted’’. All the strings are of the same length.

Input

The first line contains two integers: a positive integer n (0 < n <= 50) giving the length of the strings; and a positive integer m (0 < m <= 100) giving the number of strings. These are followed by m lines, each containing a string of length n.

Output

Output the list of input strings, arranged from ``most sorted’’ to ``least sorted’’. Since two strings can be equally sorted, then output them according to the orginal order.

Sample Input

  1. 10 6
  2. AACATGAAGG
  3. TTTTGGCCAA
  4. TTTGGCCAAA
  5. GATCAGATTT
  6. CCCGGGGGGA
  7. ATCGATGCAT

Sample Output

  1. CCCGGGGGGA
  2. AACATGAAGG
  3. GATCAGATTT
  4. ATCGATGCAT
  5. TTTTGGCCAA
  6. TTTGGCCAAA

代码:

  1. #include<stdio.h>
  2. #include<vector>
  3. #include<algorithm>
  4. #include<string.h>
  5. using namespace std;
  6. struct node
  7. {
  8. char DNA[60];
  9. int value;
  10. int order;
  11. };
  12. bool cmp(struct node a,struct node b)
  13. {
  14. if(a.value!=b.value)
  15. {
  16. return a.value<b.value;
  17. }
  18. else
  19. {
  20. return a.order>b.order;
  21. }
  22. }
  23. vector<struct node> v;
  24. int main()
  25. {
  26. int i,j,n,m,k,t;
  27. char str[60];
  28. scanf("%d %d",&n,&m);
  29. for(i=0;i<m;i++)
  30. {
  31. struct node tmp;
  32. int A=0,C=0,G=0,T=0;
  33. int value=0;
  34. scanf("%s",str);
  35. for(j=strlen(str)-1;j>=0;j--)
  36. {
  37. if(str[j]=='A')
  38. {
  39. A++;
  40. }
  41. else if(str[j]=='C')
  42. {
  43. C++;
  44. }
  45. else if(str[j]=='G')
  46. {
  47. G++;
  48. }
  49. else if(str[j]=='T')
  50. {
  51. T++;
  52. }
  53. if(str[j]>'A')
  54. {
  55. value+=A;
  56. }
  57. if(str[j]>'C')
  58. {
  59. value+=C;
  60. }
  61. if(str[j]>'G')
  62. {
  63. value+=G;
  64. }
  65. if(str[j]>'T')
  66. {
  67. value+=T;
  68. }
  69. }
  70. strcpy(tmp.DNA,str);
  71. tmp.value=value;
  72. tmp.order=i;
  73. v.push_back(tmp);
  74. }
  75. sort(v.begin(),v.begin()+m,cmp);
  76. for(i=0;i<m;i++)
  77. {
  78. printf("%s\n",v[i].DNA);
  79. }
  80. return 0;
  81. }

发表评论

表情:
评论列表 (有 0 条评论,232人围观)

还没有评论,来说两句吧...

相关阅读

    相关 DNA序列

    > 一个DNA序列由A/C/G/T四个字母的排列组合组成。G和C的比例(定义为GC-Ratio)是序列中G和C两个字母的总的出现次数除以总的字母数目(也就是序列长度)。在基因工

    相关 变色DNA

    有一只特别的狼,它在每个夜晚会进行变色,研究发现它可以变成N种颜色之一,将这些颜色标号为0,1,2...N-1。研究发现这只狼的基因中存在一个变色矩阵,记为colormap,如