|
| 1 | + |
| 2 | +# Find the Duplicate Number |
| 3 | +You are given an array of integers nums containing n + 1 integers where each integer is in the range [1, n] inclusive. There is exactly one duplicate number in nums, return this duplicate number. |
| 4 | + |
| 5 | +You must solve the problem without modifying the array and use only constant extra space. |
| 6 | + |
| 7 | +### Constraints: |
| 8 | +- 1 <= n <= 10^5 |
| 9 | +- nums.length == n + 1 |
| 10 | +- 1 <= nums[i] <= n |
| 11 | +- All the integers in nums appear only once except for exactly one integer which appears twice. |
| 12 | + |
| 13 | +### Follow-up: |
| 14 | +- How can we prove that at least one duplicate number must exist? |
| 15 | +- Can you solve the problem in O(n) time and without modifying the array? |
| 16 | + |
| 17 | +### Examples |
| 18 | +```javascript |
| 19 | +Input: nums = [1,3,4,2,2] |
| 20 | +Output: 2 |
| 21 | + |
| 22 | +Input: nums = [3,1,3,4,2] |
| 23 | +Output: 3 |
| 24 | +``` |
| 25 | + |
| 26 | +## Approaches to Solve the Problem |
| 27 | +### Approach 1: Sort the Array (Not Optimal) |
| 28 | +##### Intuition: |
| 29 | +One of the simplest approaches is to sort the array. Once the array is sorted, the duplicate number will appear next to itself. |
| 30 | + |
| 31 | +1. Sort the array. |
| 32 | +2. Iterate through the sorted array, checking if any two consecutive elements are the same. |
| 33 | +##### Time Complexity: |
| 34 | +O(n log n), because sorting the array takes O(n log n). |
| 35 | +##### Space Complexity: |
| 36 | +O(1), if sorting is done in place, or O(n) if using additional memory for sorting. |
| 37 | +##### Python Code: |
| 38 | +```python |
| 39 | +def findDuplicate(nums): |
| 40 | + nums.sort() # Sort the array |
| 41 | + |
| 42 | + # Find the first pair of consecutive duplicate elements |
| 43 | + for i in range(1, len(nums)): |
| 44 | + if nums[i] == nums[i - 1]: |
| 45 | + return nums[i] |
| 46 | +``` |
| 47 | +### Approach 2: Hash Set to Track Seen Numbers |
| 48 | +##### Intuition: |
| 49 | +We can use a hash set to keep track of numbers we have already encountered. As we iterate through the array, if a number is found in the set, it is the duplicate. |
| 50 | + |
| 51 | +1. Initialize an empty set. |
| 52 | +2. Traverse the array and check if the number is already in the set. |
| 53 | +3. If it is, return the duplicate number. |
| 54 | +##### Time Complexity: |
| 55 | +O(n), as we traverse the array once. |
| 56 | +##### Space Complexity: |
| 57 | +O(n), because we use a set to store the numbers we have seen. |
| 58 | +##### Python Code: |
| 59 | +```python |
| 60 | +def findDuplicate(nums): |
| 61 | + seen = set() # Set to track seen numbers |
| 62 | + |
| 63 | + for num in nums: |
| 64 | + if num in seen: |
| 65 | + return num # Duplicate found |
| 66 | + seen.add(num) |
| 67 | +``` |
| 68 | +### Approach 3: Binary Search on the Value Range |
| 69 | +##### Intuition: |
| 70 | +The key insight is that the array contains numbers in the range [1, n]. This allows us to apply binary search on the range of values instead of the array itself. For each mid-point of the range, count how many numbers in the array are less than or equal to mid. If this count exceeds mid, then the duplicate must be in the lower half. Otherwise, it must be in the upper half. |
| 71 | + |
| 72 | +1. Perform binary search on the range of numbers [1, n]. |
| 73 | +2. For each mid value, count how many numbers in the array are less than or equal to mid. |
| 74 | +3. Based on the count, adjust the search range. |
| 75 | +##### Visualization: |
| 76 | +```rust |
| 77 | +For example, with nums = [1, 3, 4, 2, 2], we perform the following: |
| 78 | + |
| 79 | +- Initial range: 1 to 4 (n=4) |
| 80 | +- Mid = 2. Count of numbers <= 2 is 3 (1, 2, 2). |
| 81 | + Since 3 > 2, the duplicate is in the lower half. |
| 82 | +- Narrow down the range to 1 to 2. |
| 83 | +- Mid = 1. Count of numbers <= 1 is 1. |
| 84 | + The duplicate must be in the upper half. |
| 85 | +- Range becomes 2 to 2. Found duplicate = 2. |
| 86 | +``` |
| 87 | +##### Time Complexity: |
| 88 | +O(n log n), because we perform binary search on the range and for each mid-point, we do a linear scan of the array. |
| 89 | +##### Space Complexity: |
| 90 | +O(1), since no extra space is used except for a few variables. |
| 91 | +##### Python Code: |
| 92 | +```python |
| 93 | +def findDuplicate(nums): |
| 94 | + left, right = 1, len(nums) - 1 |
| 95 | + |
| 96 | + while left < right: |
| 97 | + mid = (left + right) // 2 |
| 98 | + count = sum(num <= mid for num in nums) |
| 99 | + |
| 100 | + if count > mid: |
| 101 | + right = mid # Duplicate is in the lower half |
| 102 | + else: |
| 103 | + left = mid + 1 # Duplicate is in the upper half |
| 104 | + |
| 105 | + return left # The duplicate number |
| 106 | +``` |
| 107 | +### Approach 4: Floyd’s Tortoise and Hare (Cycle Detection) |
| 108 | +##### Intuition: |
| 109 | +This approach treats the problem as detecting a cycle in a linked list. Imagine the array as a linked list where each element points to another element in the array (the value at that index). The duplicate number creates a cycle. Using Floyd's Tortoise and Hare algorithm, we can detect this cycle. |
| 110 | + |
| 111 | +1. Initialize two pointers, slow and fast. |
| 112 | +2. Move slow one step at a time and fast two steps at a time. |
| 113 | +3. If a cycle exists, slow and fast will meet at some point. |
| 114 | +4. Reset one pointer to the start and move both pointers one step at a time until they meet again, which will be at the duplicate number. |
| 115 | +##### Why This Works: |
| 116 | +- The duplicate number creates a cycle because it points to a number already visited. The cycle detection method will find the duplicate as the entry point of the cycle. |
| 117 | +##### Visualization: |
| 118 | +```rust |
| 119 | +Array: [1, 3, 4, 2, 2] |
| 120 | + |
| 121 | +1 -> 3 -> 2 -> 4 -> 2 (cycle starts at 2) |
| 122 | + |
| 123 | +Step 1: |
| 124 | +- slow = 1, fast = 3 |
| 125 | + |
| 126 | +Step 2: |
| 127 | +- slow = 3, fast = 4 |
| 128 | + |
| 129 | +Step 3: |
| 130 | +- slow = 2, fast = 2 (they meet) |
| 131 | + |
| 132 | +Reset slow to the start, then move both pointers one step at a time: |
| 133 | +- slow = 1, fast = 2 |
| 134 | +- slow = 3, fast = 2 |
| 135 | +- slow = 2, fast = 2 (found duplicate = 2) |
| 136 | +``` |
| 137 | +##### Time Complexity: |
| 138 | +O(n), as we traverse the array once. |
| 139 | +##### Space Complexity: |
| 140 | +O(1), since we only use constant extra space. |
| 141 | +##### Python Code: |
| 142 | +```python |
| 143 | +def findDuplicate(nums): |
| 144 | + slow = nums[0] |
| 145 | + fast = nums[0] |
| 146 | + |
| 147 | + # First phase: detect the cycle |
| 148 | + while True: |
| 149 | + slow = nums[slow] |
| 150 | + fast = nums[nums[fast]] |
| 151 | + if slow == fast: |
| 152 | + break |
| 153 | + |
| 154 | + # Second phase: find the entrance to the cycle |
| 155 | + slow = nums[0] |
| 156 | + while slow != fast: |
| 157 | + slow = nums[slow] |
| 158 | + fast = nums[fast] |
| 159 | + |
| 160 | + return slow # The duplicate number |
| 161 | +``` |
| 162 | +## Summary |
| 163 | + |
| 164 | +| Approach | Time Complexity | Space Complexity | |
| 165 | +|-----------------------------------|-----------------|------------------| |
| 166 | +| Sort the Array | O(n log n) | O(1) | |
| 167 | +| Hash Set | O(n) | O(n) | |
| 168 | +| Binary Search on Value Range | O(n log n) | O(1) | |
| 169 | +| Floyd’s Tortoise and Hare | O(n) | O(1) | |
| 170 | + |
| 171 | +The Floyd’s Tortoise and Hare approach is the most optimal, providing O(n) time complexity and O(1) space complexity without modifying the array |
0 commit comments