Well, using render-textures in Unity we could have it rendering the input from multiple "cameras" onto separate surfaces onto the overall game surface (which has to be a single window, or fullscreen on a single monitor-device).
But I don't know that this technique would be possible with the orthographic-perfect 2D display we have to use for other reasons.
There's also significant performance issues with basically having multiple full-render passes per frame.
And there's also potentially a ton of refactoring since the game is very much used to there being only one "local" planet, so to speak.