Accessing Iterator In 'for In' Loop

March 26, 2023 Post a Comment

From my understanding, when code like the following is run: for i in MyObject: print(i) MyObject's __iter__ function is run, and the for loop uses the iterator it returns to r

Solution 1:

It is possible to do what you want to do, as long as you're willing to rely on multiple undocumented internals of your Python interpreter (in my case, CPython 3.7)—but it isn't going to do you any good.

The iterator is not exposed to locals, or anywhere else (not even to a debugger). But as pointed out by Patrick Haugh, you can get at it indirectly, via get_referrers. For example:

for ref in gc.get_referrers(seq):
    if isinstance(ref, collections.abc.Iterator):
        break
else:
    raise RuntimeError('Oops')

Of course if you have two different iterators to the same list, I don't know if there's any way you can decide between them, but let's ignore that problem.

Now, what do you do with this? You've got an iterator over seq, and… now what? You can't replace it with something useful, like an itertools.chain(seq, [1, 2, 3]). There's no public API for mutating list, set, etc. iterators, much less arbitrary iterators.

if you happen to know it's a list iterator… well, the CPython 3.x listiterator does happen to be mutable. The way they're pickled is by creating an empty iterator and calling __setstate__ with a reference to a list and an index:

>>> print(ref.__reduce__())
(<function iter>, ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9],), 7)
>>> ref.__setstate__(3) # resets the iterator to index 3 instead of 7
>>> ref.__reduce__()[1][0].append(10) # adds another value

But this is all kind of silly, because you could get the same effect by just mutating the original list. In fact:

>>> ref.__reduce__()[1][0] is seq
True

So:

lst = list(range(10))
for elem in lst:
  print(elem, end=' ')
  if elem % 2:
    lst.append(elem * 2)
print()

… will print out:

0 1 2 3 4 5 6 7 8 9 2 6 10 14 18

… without having to monkey with the iterator at all.

You can't do the same thing with a set.

Mutating a set while you're in the middle of iterating it will affect the iterator, just as mutating a list will—but what it does is indeterminate. After all, sets have arbitrary order, which is only guaranteed to be consistent as long as you don't add or delete. What happens if you add or delete in the middle? You may get a whole different order, meaning you may end up repeating elements you already iterated, and missing ones you never saw. Python implies that this should be illegal in any implementation, and CPython does actually check it:

s = set(range(10))
for elem in s:
  print(elem, end=' ')
  if elem % 2:
    s.add(elem * 2)
print()

This will just immediately raise:

RuntimeError: Set changed size during iteration

So, what happens if we use the same trick to go behind Python's back, find the set_iterator, and try to change it?

s = {1, 2, 3}
for elem in s:
    print(elem)
    for ref in gc.get_referrers(seq):
        if isinstance(ref, collections.abc.Iterator):
            break
    else:
        raise RuntimeError('Oops')
    print(ref.__reduce__)

What you'll see in this case will be something like:

2
(<function iter>, ([1, 3],))
1
(<function iter>, ([3],))
3
(<function iter>, ([],))

In other words, when you pickle a set_iterator, it creates a list of the remaining elements, and gives you back instructions to build a new listiterator out of that list. Mutating that temporary list obviously has no useful effect.

What about a tuple? Obviously you can't just mutate the tuple itself, because tuples are immutable. But what about the iterator?

Under the covers, in CPython, tuple_iterator shares the same structure and code as listiterator (as does the iterator type that you get from calling iter on an "old-style sequence" type that defines __len__ and __getitem__ but not __iter__). So, you can do the exact same trick to get at the iterator, and toreduce` it.

But once you do, ref.__reduce__()[1][0] is seq is going to be true again—in other words, it's a tuple, the same tuple you already had, and still immutable.

Solution 2:

No, it is not possible to access this iterator (unless maybe with the Python C API, but that is just a guess). If you need it, assign it to a variable before the loop.

it = iter(MyObject)
for i in it:
  print(i)
  # do something with it

Keep in mind that manually advancing the iterator can raise a StopIteration exception.

for i in it:
  if check_skip_next_element(i):
    try: next(it)
    except StopIteration: break

The use of break is discussable. In this case it has the same semantics as continue but you may just use pass if you want to keep going until the end of the for-block.

Solution 3:

If you want to insert an additional object into a loop mid-iteration in a debugger, you don't need to do it by modifying the iterator. Instead, after the end of the loop, jump to the first line of the loop body, then set the loop variable to the object you want. Here's a PDB example. With the following file:

import pdb

def f():
    pdb.set_trace()
    for i in range(5):
        print(i)
f()

I've recorded a debugging session that inserts a 15 into the loop:

> /tmp/asdf.py(5)f()
-> for i in range(5):
(Pdb) n
> /tmp/asdf.py(6)f()
-> print(i)
(Pdb) n
0
> /tmp/asdf.py(5)f()
-> for i in range(5):
(Pdb) j 6
> /tmp/asdf.py(6)f()
-> print(i)
(Pdb) i = 15
(Pdb) n
15
> /tmp/asdf.py(5)f()
-> for i in range(5):
(Pdb) n
> /tmp/asdf.py(6)f()
-> print(i)
(Pdb) n
1
> /tmp/asdf.py(5)f()
-> for i in range(5):
(Pdb) c
2
3
4

(Due to a PDB bug, you have to jump, then set the loop variable. PDB will lose the change to the loop variable if you jump immediately after setting it.)

Solution 4:

If you are not aware of the pdb debugger in python, please give it a try. It's a very interactive debugger I have ever come across.

python debugger

I am sure we can control the loop iterations manually with pdb. But altering list mid way, not sure. Give it a try.

Solution 5:

To access the iterator of a given object, you can use the iter() built-in function.

>>> it = iter(MyObject)
>>> it.next()

Python Library