Leetcode: Repeated DNA Sequences

男娘i 2022-08-07 08:38 122阅读 0赞

题目:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

  1. Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
  2. Return:
  3. ["AAAAACCCCC", "CCCCCAAAAA"].

刚开始就想到用map进行子串的存储,结果Memory Limit Exceeded

我刚开始这样写的:

  1. class Solution
  2. {
  3. public:
  4. vector<string> findRepeatedDnaSequences(string s)
  5. {
  6. map<string, int> dnamap;
  7. vector<string> result;
  8. string::size_type length = s.length();
  9. string sequence;
  10. for (size_t i = 0; i + 10 < length; i++)
  11. {
  12. sequence = s.substr(i, 10);
  13. dnamap[sequence] += 1;
  14. }
  15. for (std::map<string, int>::iterator it = dnamap.begin(); it != dnamap.end(); ++it)
  16. {
  17. if (it->second > 1)
  18. {
  19. result.push_back(it->first);
  20. }
  21. }
  22. return result;
  23. }
  24. };

后来看到https://leetcode.com/discuss/24478/i-did-it-in-10-lines-of-c的讨论,原来可以将子字符串转成数字,存储在map中,这样就不会超出限制的内存空间了。

原文是这样说的:The main idea is to store the substring as int in map to bypass the memory limits.

下面是我写的C++代码:

  1. class Solution
  2. {
  3. public:
  4. int str2int(string s) {
  5. int num = 0;
  6. for (int i = 0; i < s.size(); ++i)
  7. //s[i]&7取出s[i]的后三个字节,因为用三个字节可以完全表示A,C,G,T
  8. num = (num << 3) + (s[i] & 7);
  9. return num;
  10. }
  11. vector<string> findRepeatedDnaSequences(string s) {
  12. vector<string> result;
  13. unordered_map<int, int> dictionary;
  14. string sequence;
  15. for (int i = 0; i + 10 <= s.size(); ++i)
  16. {
  17. sequence = s.substr(i, 10);
  18. if (dictionary[str2int(sequence)]++ == 1)
  19. result.push_back(sequence);
  20. }
  21. return result;
  22. }
  23. };

看到很多朋友这道题都使用的是unordered_map,而不是map,然后我查阅了资料,下面是unordered_map和map的区别:

unordered_map原来是boost库中的容器,在C++11标准中被引入到STL中。unordered_map和map的区别就是,map是按照operator<比较判断元素是否相同,以及比较元素的大小,然后选择合适的位置插入到树中。所以,如果对map进行遍历(中序遍历)的话,输出的结果是有序的。顺序就是按照operator< 定义的大小排序。而unordered_map是计算元素的Hash值,根据Hash值判断元素是否相同。所以,对unordered_map进行遍历,结果是无序的。

用法的区别就是,map的key需要定义operator<。 而boost::unordered_map需要定义hash_value函数并且重载operator==。对于内置类型,如string,这些都不用操心。对于自定义的类型做key,就需要自己重载operator<或者hash_value()了。 当不需要结果排好序时,最好unordered_map

其实,C++中map对于与Java中的TreeMap,而unordered_map对应于Java中的HashMap。

C#代码:

貌似C#中Dictionary不支持对不存在的key的直接索引,所以要先判断key存不存在。C++中如果不存在会自动添加要索引的key。

  1. public class Solution
  2. {
  3. private int StrToInt(string s)
  4. {
  5. int num = 0;
  6. for (int i = 0; i < s.Length; ++i)
  7. {
  8. num = (num << 3) + (s[i] & 7);
  9. }
  10. return num;
  11. }
  12. public IList<string> FindRepeatedDnaSequences(string s)
  13. {
  14. IList<string> result = new List<string>();
  15. IDictionary<int, int> map = new Dictionary<int, int>();
  16. string sequence = string.Empty;
  17. int key = 0;
  18. for (int i = 0; i + 10 <= s.Length; ++i)
  19. {
  20. sequence = s.Substring(i, 10);
  21. key = StrToInt(sequence);
  22. if (map.ContainsKey(key))
  23. {
  24. if (map[key]++ == 1)
  25. {
  26. result.Add(sequence);
  27. }
  28. }
  29. else
  30. {
  31. map.Add(key, 1);
  32. }
  33. }
  34. return result;
  35. }
  36. }

Python代码:

注意内置函数ord将单个字符串即字符转成对于的ASCII(Return the integer ordinal of a one-character string)

  1. class Solution:
  2. # @param s, a string
  3. # @return a list of strings
  4. def str2int(self, s):
  5. num = 0
  6. for i in range(len(s)):
  7. num = (num << 3) + (ord(s[i]) & 7)
  8. return num
  9. def findRepeatedDnaSequences(self, s):
  10. length = len(s)
  11. result = []
  12. if length <= 10:
  13. return result
  14. map = {}
  15. i = 0
  16. while (i + 10) <= length:
  17. sequence = s[i : i + 10]
  18. num = self.str2int(sequence)
  19. if map.has_key(num):
  20. map[num] = map[num] + 1
  21. if map[num] == 1:
  22. result.append(sequence)
  23. else:
  24. map[num] = 0
  25. i = i + 1
  26. return result

发表评论

表情:
评论列表 (有 0 条评论,122人围观)

还没有评论,来说两句吧...

相关阅读