I think part of the weirdness is that machine speech can't take into account the natural flexibility that occurs when a syllable is pronounced in context with other syllables — this is why sentences constructed from a selection of prerecorded words/syllables often sound a bit stilted. So because their 是 has to fit into many different sentences, it doesn't sound like it would sound if pronounced in isolation. Specifically, it sounds a bit "cut off" at the end to me, making it hard to definitively identify the vowel. Have a listen to Google Translate speaking some English, and you may see what I mean.
no subject