Stanford CoreNLP and Stanford Parser produce different parse trees!

Stanford CoreNLP and Stanford Parser produce different parse trees!

The Problem

The Stanford Parser is included in the Stanford CoreNLP distribution, but calling the parser from CoreNLP does not always produce the same results as calling the parser directly.

This can be a bit aggrevating if you do not know why.

Take the following sentence from the newswire article MUC-7 dataset:

HOUSTON  --  The sudden break in a 12-mile-long tether linking the space shuttle Columbia and an Italian satellite as part of a physics experiment was caused by an electrical short and brief fire, a NASA investigative panel reported Tuesday. 

The parser (online demo here) produces the following tree:

(ROOT
  (NP
    (NP (NNP HOUSTON))
    (: --)
    (NP
      (NP (DT The) (JJ sudden) (NN break))
      (PP (IN in)
        (NP
          (NP (DT a) (JJ 12-mile-long) (NN tether))
          (VP (VBG linking)
            (S
              (NP (DT the) (NN space) (NN shuttle))
              (NP
                (NP (NNP Columbia))
                (CC and)
                (NP (DT an) (JJ Italian) (NN satellite))))
            (SBAR (IN as)
              (S
                (NP
                  (NP (NN part))
                  (PP (IN of)
                    (NP (DT a) (NN physics) (NN experiment))))
                (VP (VBD was)
                  (VP (VBN caused)
                    (PP (IN by)
                      (NP
                        (NP (DT an) (JJ electrical)
                          (ADJP (JJ short)
                            (CC and)
                            (JJ brief))
                          (NN fire))
                        (, ,)
                        (NP
                          (NP (DT a) (NNP NASA) (JJ investigative) (NN panel))
                          (VP (VBN reported)
                            (NP-TMP (NNP Tuesday))))))))))))))
    (. .)))

While CoreNLP produces:

(ROOT
  (NP
    (NP (NNP HOUSTON))
    (: --)
    (S
      (S
        (NP
          (NP (DT The) (JJ sudden) (NN break))
          (PP (IN in)
            (NP (DT a) (JJ 12-mile-long))))
        (VP (VB tether)
          (S
            (VP (VBG linking)
              (S
                (NP (DT the) (NN space) (NN shuttle))
                (NP
                  (NP (NNP Columbia))
                  (CC and)
                  (NP (DT an) (JJ Italian) (NN satellite))))))
          (SBAR (IN as)
            (S
              (NP
                (NP (NN part))
                (PP (IN of)
                  (NP (DT a) (NN physics) (NN experiment))))
              (VP (VBD was)
                (VP (VBN caused)
                  (PP (IN by)
                    (NP (DT an) (JJ electrical)
                      (ADJP (JJ short)
                        (CC and)
                        (JJ brief))
                      (NN fire)))))))))
      (, ,)
      (NP (DT a) (NNP NASA) (JJ investigative) (NN panel))
      (VP (VBD reported)
        (NP-TMP (NNP Tuesday))))
    (. .)))

In this case, the parser output is preferred.

The Question

Why does this happen? And can CoreNLP be modified to produce the same output as the standalone parser?

The Answer

According to a post from one of the Stanford NLP mailing lists (java-nlp-user), it is because CoreNLP does POS tagging before running the parser, whereas the parser will do this internally itself. Concerning the CoreNLP pipeline POS-tagger:

The tagger is generally better at tagging, which may or may not
produce better parse trees.  Also, the parser was trained with more
data than the tagger, so there are a few situations that the parser
has better training data than the tagger, although in general the
tagger is better at POS tags than the parser is.

CoreNLP can be modified to produce the same output as the parser! To do this, modify your set of annotators to exclude the POS-tagger (NOTE: the below is Scala):

Now running the CoreNLP pipeline will result in the same parses.