You can use a virtual grid for the ground where you defined walkable and non walkable squares. If the player try to go from one point to another, it's not enough to find a correct path avoiding obstacles on the way. If you use just colliders (invisible geometry) to handle collisions, you'll only be able to handle movement in straight lines. And you'll have parallax effect when you pan the camera, wether you want it or not. You will also have to deal with texture distortion on the border of the screen depending of the field of view you have chosen for the camera. With perspective camera, characters depth will be handled automatically, but compositing your background will be much harder since you'll have to place elements at a correct z-depth with appropriate sizes. Another minor issue is that you won't have any parallax effect with foreground elements if you pan the camera, you'll have to create these kinds of effects manually.
![unity 3d characters unity 3d characters](https://miro.medium.com/max/502/1*2a_fiuHScvNDhIJZ2_A6dA.png)
![unity 3d characters unity 3d characters](https://docs.unity3d.com/2019.1/Documentation/uploads/Main/wrong_rotation.png)
You can create a grid on the ground to assign scale factors and speed factors for characters and assign them manually (that's kind of what was done for old point clicks). It's easy but there's a major problem : if your characters are rendered with an ortho camera, there's no way they will get smaller when going into the distance. It's easy because you don't have to take into consideration the distance at which elements are placed to choose the quad size (with ortho camera, no matter the distance, objects have the same size). You just have to keep objects behind which characters can go in separated layers, apply all your textures planes on quads facing the camera and order them with a correct Z position.
![unity 3d characters unity 3d characters](https://aiaustin.files.wordpress.com/2020/06/af3ca-unitychan-demo.jpg)
With ortho camera, you'll be able to composite your environments very much like layers in a photoshop document with pixel precision. I think your main problem is to choose between ortho camera and perspective camera.