I've always understood mise-en-scene to be the opposite of montage.
Montage uses the shot as a building block, and the editing together of these shots is what creates meaning and portrays the story to the viewer.
Mise-en-scene is allowing the shot itself to tell the whole story without editing, so that the foreground, middleground, and background all combine to let a viewer know the relatinships between objects/people. Also, to allow meaning to build up by moving the camera and changing the shot through a dolly/pan, instead of editing to the next shot.
The Russians believed shots on their own are meaningless, and it is only through editing that the full scope of the images is understood. This is what separates film from photography.
The French believed that meaning is built up upon layers, much like how we view reality, without quick edits to intellectualize, but slow movement through space, with time to take in what is in front of you.
Atleast from what I've read, and the examples I've seen.