最长公共子序列(LCS)算法

Love The Way You Lie 2022-06-05 01:10 346阅读 0赞

一、最长公共字串与最长公共子序列

最长公共子串(Longest Common Substirng)

子串是串的一个连续的部分,子串中字符的位置必须连续

例如:有两个字符串ABCBDABBDCABA,则它们的最长公共子串是:AB

最长公共子序列(Longest Common Subsequence,LCS)

子序列是从串中去掉任意的元素而获得新的序列,子串中字符的位置不必连续

例如:有两个字符串ABCBDABBDCABA,则它们的最长公共子序列是:BCAB


二、LCS算法

step1:生成矩阵

创建一个大小为str1_len×str2_len的矩阵,其中str1_lenstr2_len分别为串str1和串str2的长度,初始化为0

按照以下规则生成矩阵:

i和j分别从1开始,i++,j++循环

  • 如果str1[i] == str2[j],则L[i,j] = L[i - 1, j -1] + 1
  • 如果str1[i] != str2[j],则L[i,j] = max{L[i,j - 1],L[i - 1, j]}

    void init_array(char str1, char str2) {

    1. int i,j;
    2. for(i=1; i<=str1_len; i++)
    3. for(j=1; j<=str2_len; j++) {
    4. if(str1[i-1] == str2[j-1])
    5. a[i][j] = a[i-1][j-1] + 1;
    6. else {
    7. if(a[i][j-1] >= a[i-1][j])
    8. a[i][j] = a[i][j-1];
    9. else
    10. a[i][j] = a[i-1][j];
    11. }
    12. }

    }

step2:计算公共子序列

按照以下规则计算公共子序列:

ij分别从str1_lenstr2_len开始,递减循环直到i = 0,j = 0

  • 如果str1[i-1] == str2[j-1],则将str[i]字符插入到子序列中,i--,j--
  • 如果str1[i-1] != str[j-1],则比较L[i,j-1]L[i-1,j]L[i,j-1]大,则j--,否则i--;(如果相等,则任选一个

LCS

  1. void parser(char *str1, char *str2, char *res) {
  2. int i,j,k = 0;
  3. for(i = str1_len, j = str2_len; i >= 1 && j >= 1;) {
  4. if(str1[i-1] == str2[j-1]) {
  5. res[k++] = str1[i-1];
  6. i--;
  7. j--;
  8. } else
  9. if (a[i][j-1] > a[i-1][j])
  10. j--;
  11. else
  12. i--;
  13. }
  14. }

step3:逆序存放公共子序列

step2得到的公共子序列是从后往前获得的,需要逆序存放或输出

  1. char* reverse(char *str) {
  2. int n = strlen(str) / 2;
  3. int i = 0;
  4. char tmp;
  5. for(i=0; i<n; i++) {
  6. tmp = str[i];
  7. str[i] = str[strlen(str)-i-1];
  8. str[strlen(str)-i-1] = tmp;
  9. }
  10. return str;
  11. }

三、完整代码

  1. #include <stdio.h>
  2. #include <stdlib.h>
  3. #include <string.h>
  4. #define MAX_LEN 256
  5. int str1_len, str2_len;
  6. int a[MAX_LEN][MAX_LEN];
  7. void init_str(char *str1, char *str2) {
  8. printf("please input str1: ");
  9. scanf("%s", str1);
  10. printf("please input str2: ");
  11. scanf("%s", str2);
  12. }
  13. void init_array(char *str1, char *str2) {
  14. int i,j;
  15. for(i=1; i<=str1_len; i++)
  16. for(j=1; j<=str2_len; j++) {
  17. if(str1[i-1] == str2[j-1])
  18. a[i][j] = a[i-1][j-1] + 1;
  19. else {
  20. if(a[i][j-1] >= a[i-1][j])
  21. a[i][j] = a[i][j-1];
  22. else
  23. a[i][j] = a[i-1][j];
  24. }
  25. }
  26. }
  27. void parser(char *str1, char *str2, char *res) {
  28. int i,j,k = 0;
  29. for(i = str1_len, j = str2_len; i >= 1 && j >= 1;) {
  30. if(str1[i-1] == str2[j-1]) {
  31. res[k++] = str1[i-1];
  32. i--;
  33. j--;
  34. } else
  35. if (a[i][j-1] > a[i-1][j])
  36. j--;
  37. else
  38. i--;
  39. }
  40. }
  41. char* reverse(char *str) {
  42. int n = strlen(str) / 2;
  43. int i = 0;
  44. char tmp;
  45. for(i=0; i<n; i++) {
  46. tmp = str[i];
  47. str[i] = str[strlen(str)-i-1];
  48. str[strlen(str)-i-1] = tmp;
  49. }
  50. return str;
  51. }
  52. int main(void) {
  53. char str1[MAX_LEN], str2[MAX_LEN], *res;
  54. init_str(str1, str2);
  55. str1_len = strlen(str1);
  56. str2_len = strlen(str2);
  57. init_array(str1, str2);
  58. res = (char*)malloc(sizeof(char) * (str1_len + str2_len));
  59. parser(str1, str2, res);
  60. printf("Result : %s\n", reverse(res));
  61. return 0;
  62. }

四、牛刀小试

POJ 1458 Common Subsequence

Description
A subsequence of a given sequence is the given sequence with some elements (possible none) left out. Given a sequence X = < x1, x2, …, xm > another sequence Z = < z1, z2, …, zk > is a subsequence of X if there exists a strictly increasing sequence < i1, i2, …, ik > of indices of X such that for all j = 1,2,…,k, xij = zj. For example, Z = < a, b, f, c > is a subsequence of X = < a, b, c, f, b, c > with index sequence < 1, 2, 4, 6 >. Given two sequences X and Y the problem is to find the length of the maximum-length common subsequence of X and Y.

Input
The program input is from the std input. Each data set in the input contains two strings representing the given sequences. The sequences are separated by any number of white spaces. The input data are correct.

Output
For each set of data the program prints on the standard output the length of the maximum-length common subsequence from the beginning of a separate line.

Sample Input

abcfbc abfcab
programming contest
abcd mnp

Sample Output

4
2
0

Code

  1. import java.util.Scanner;
  2. public class Main {
  3. public static void main(String[] args) {
  4. Scanner sc = new Scanner(System.in);
  5. while (sc.hasNext()) {
  6. String[] tmp = sc.nextLine().trim().split("\\s+");
  7. String str1 = tmp[0];
  8. String str2 = tmp[1];
  9. int[][] data = new int[str1.length() + 1][str2.length() + 1];
  10. for (int i = 1; i < data.length; i++)
  11. for (int j = 1; j < data[i].length; j++) {
  12. if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
  13. data[i][j] = data[i - 1][j - 1] + 1;
  14. } else {
  15. data[i][j] = Math.max(data[i][j - 1], data[i - 1][j]);
  16. }
  17. }
  18. System.out.println(data[str1.length()][str2.length()]);
  19. }
  20. }
  21. }

发表评论

表情:
评论列表 (有 0 条评论,346人围观)

还没有评论,来说两句吧...

相关阅读

    相关 公共序列LCS问题

    好久没有写博客了,刚才在网上看了清华大学的数据结构公开课,链接:https://www.xuetangx.com 可以注册个账号去听数据结构课程,老师讲的特好。 我的代码是按

    相关 LCS 公共序列

    首先要明白什么是子序列,什么是子串; 设:主串长度为n; 子序列:从主串中抽出少于n的元素组成的序列(这些抽出的元素比一定是连续的他们的相对位置不变);