what about creating a track per take ? then create a track group called voice, and focus on this track group only (create a single bus which all tracks outputs are connected to).
then you split the different takes into relevant regions in each track and mute / unmute those you want for quick comparison.
When you find the right combo of regions, create a new empty track and redirect the voice bus outputs to it and record.