[ROOT-6942] ROOT 6, built with Cocoa (MacOS X) support, doesn't start in batch mode over ssh Created: 09/Dec/14  Updated: 12/Jun/15

Status: Open
Project: ROOT
Component/s: Graphics
Affects Version/s: 6.02/02
Fix Version/s: None

Type: Bug Priority: High
Reporter: Attila Krasznahorkay Assignee: Olivier Couet
Resolution: Unresolved Votes: 0
Labels: None
Environment:

MacOS X 10.9, clang 5.1


Development:

 Description   

We (in ATLAS) use a central MacOS X 10.9 build machine to build nightly versions of our analysis software release against ROOT 6. Currently we use v6-02-02 for this.

The ROOT binary is taken straight from the root.cern.ch webpage as far as I know. But in any case, it gives:

[bash][macatc52]:~ > root-config --config
COMERR_LIBRARY=/usr/lib/libcom_err.dylib JPEG_INCLUDE_DIR=/opt/local/include JPEG_LIBRARY=/opt/local/lib/libjpeg.dylib KRB5_INCLUDE_DIR=/usr/include/krb5 KRB5_LIBRARY=/usr/lib/libkrb5.dylib KRB5_MIT_LIBRARY=/usr/lib/libk5crypto.dylib LBER_LIBRARY=/usr/lib/liblber.dylib LDAP_INCLUDE_DIR=/usr/include LDAP_LIBRARY=/usr/lib/libldap.dylib LIBXML2_INCLUDE_DIR=/opt/local/include/libxml2 LIBXML2_LIBRARIES=/opt/local/lib/libxml2.dylib ODBC_LIBRARY=/usr/lib/libiodbc.dylib OPENGL_INCLUDE_DIR=/System/Library/Frameworks/OpenGL.framework OPENGL_gl_LIBRARY=/System/Library/Frameworks/OpenGL.framework OPENGL_glu_LIBRARY=/System/Library/Frameworks/AGL.framework OPENSSL_CRYPTO_LIBRARY=/usr/lib/libcrypto.dylib OPENSSL_INCLUDE_DIR=/usr/include OPENSSL_LIBRARIES=/usr/lib/libssl.dylib;/usr/lib/libcrypto.dylib OPENSSL_SSL_LIBRARY=/usr/lib/libssl.dylib PC_LIBXML_INCLUDEDIR=/opt/local/include PC_LIBXML_INCLUDE_DIRS=/opt/local/include/libxml2 PC_LIBXML_LIBRARIES=xml2 PC_LIBXML_LIBRARY_DIRS=/opt/local/lib PC_LIBXML_STATIC_INCLUDE_DIRS=/opt/local/include/libxml2 PC_LIBXML_STATIC_LIBRARIES=xml2;pthread;z;iconv;m PC_LIBXML_STATIC_LIBRARY_DIRS=/opt/local/lib PC_SQLITE_INCLUDEDIR=/opt/local/include PC_SQLITE_INCLUDE_DIRS=/opt/local/include PC_SQLITE_LIBRARIES=sqlite3 PC_SQLITE_LIBRARY_DIRS=/opt/local/lib PC_SQLITE_STATIC_INCLUDE_DIRS=/opt/local/include PC_SQLITE_STATIC_LIBRARIES=sqlite3 PC_SQLITE_STATIC_LIBRARY_DIRS=/opt/local/lib PNG_LIBRARY=/usr/X11R6/lib/libpng.dylib PNG_LIBRARY_RELEASE=/usr/X11R6/lib/libpng.dylib PNG_PNG_INCLUDE_DIR=/usr/X11R6/include POSTGRESQL_LIBRARIES=/usr/lib/libpq.dylib PYTHON_INCLUDE_DIR=/System/Library/Frameworks/Python.framework/Headers PYTHON_LIBRARY=/System/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib PYTHON_LIBRARY_RELEASE=/System/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib SQLITE_INCLUDE_DIR=/usr/include SQLITE_LIBRARIES=/usr/lib/libsqlite3.dylib TIFF_INCLUDE_DIR=/opt/local/include TIFF_LIBRARY=/opt/local/lib/libtiff.dylib ZLIB_INCLUDE_DIR=/usr/include ZLIB_LIBRARY=/usr/lib/libz.dylib

The problem is that it's impossible to start an interactive root shell over ssh using this ROOT version. Because it crashes like:

[bash][macatc52]:~ > root -b
_RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
fatal error: file
      '/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/c++/v1/__config'
      modified since it was first processed
Warning in cling::IncrementalParser::CheckABICompatibility():
  Possible C++ standard library mismatch, compiled with _LIBCPP_VERSION v1101 but extraction of runtime standard library version failed.
 
 *** Break *** segmentation violation
 Generating stack trace...
 0x00000001029baa37 in cling::Interpreter::declare(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, cling::Transaction**) (in libCling.so) + 103
 0x00000001028da3e7 in TCling::TCling(char const*, char const*) (in libCling.so) + 3543
 0x00000001028d746f in CreateInterpreter (in libCling.so) + 47
 0x0000000102379e9b in TROOT::InitInterpreter() (in libCore.so) + 171
 0x0000000102379dd1 in ROOT::GetROOT2() (in libCore.so) + 33
 0x00000001023eab69 in TApplication::TApplication(char const*, int*, char**, void*, int) (in libCore.so) + 297
 0x000000010283635a in TRint::TRint(char const*, int*, char**, void*, int, bool) (in libRint.so) + 42
 0x0000000102836ee4 in TRint::TRint(char const*, int*, char**, void*, int, bool) (in libRint.so) + 20
 0x000000010236de81 in main (in root.exe) + 65
 0x00007fff891a55fd in start (in libdyld.dylib) + 1

It's interesting to note that while v5-34-24 in the same setup complains a bit as well, it does at least manage to start up correctly in the end.

[bash][macatc52]:~ > root -b
_RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
  *******************************************
  *                                         *
  *        W E L C O M E  to  R O O T       *
  *                                         *
  *   Version   5.34/24   2 December 2014   *
  *                                         *
  *  You are welcome to visit our Web site  *
  *          http://root.cern.ch            *
  *                                         *
  *******************************************
 
ROOT 5.34/24 (v5-34-24@v5-34-24, Dec 02 2014, 18:18:32 on macosx64)
 
CINT/ROOT C/C++ Interpreter version 5.18.00, July 2, 2010
Type ? for help. Commands must be C++ statements.
Enclose multiple statements between { }.
root [0]

You can reproduce the issue by logging into a MacOS X 10.9 machine over ssh that has CVMFS access, and doing:

export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
localSetupROOT 6.02.02-x86_64-mac109-clang51-opt

I believe that it should behave the same with the mac1010 version as well. That's available by doing:

localSetupROOT 6.02.02-x86_64-mac1010-clang60-opt

as the last command in the previous example.

Of course the binaries work just fine when using them on a machine (like my laptop) directly.

Cheers,
Attila



 Comments   
Comment by Attila Krasznahorkay [ 09/Dec/14 ]

Also, v6-00-02 seems to behave correctly as well:

[bash][macatc52]:~ > localSetupROOT 6.00.02-x86_64-mac109-clang51-opt
Setting up ROOT version 6.00.02-x86_64-mac109-clang51-opt
[bash][macatc52]:~ > root -b
_RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
   ------------------------------------------------------------
  | Welcome to ROOT 6.00/02                http://root.cern.ch |
  |                               (c) 1995-2014, The ROOT Team |
  | Built for macosx64                                         |
  | From tag v6-00-02, 2 July 2014                             |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
   ------------------------------------------------------------
 
root [0]

It's not guaranteed to be a graphics issue, I just didn't know where to put it exactly. :-/

Cheers,
Attila

Comment by Olivier Couet [ 09/Dec/14 ]

Hi Attila,

As the traceback you sent does not show anything related to cocoa specifically, it would be interesting to know if you get also a crash using the X11 version ? If the X11 version also crashes it would mean the problem is more Cling related...

Cheers,
Olivier

Comment by Attila Krasznahorkay [ 09/Dec/14 ]

Hi Olivier,

Something to try tomorrow...

Cheers,
Attila

Comment by Pere Mato Vila [ 10/Dec/14 ]

I think there are two independent problems:

  • The first one: _RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
    is due to the fact that there is no graphics session on this node. It is a warning that needs to be probably removed.
  • The second could be a c++ library runtime incompatibility detected by cling/llvm. It is certainly an issue related to ROOT6 and the generated PCH. It would be nice to investigate the exact version of the c++ library in the system that the binaries were produced and your target system. We need to understand why we are very sensitive to these variations.
Comment by Attila Krasznahorkay [ 10/Dec/14 ]

To stay with the Cocoa issue a bit longer: I just tested what happens when I log into my 10.10.1 laptop over ssh and start ROOT in batch mode. This:

[bash][pb-d-128-141-37-96]:~ > root -b
Dec 10 09:21:53 pb-d-128-141-37-96.cern.ch root[6613] <Error>: Set a breakpoint at CGSLogError to catch errors as they are logged.
Dec 10 09:21:53 pb-d-128-141-37-96.cern.ch root[6613] <Error>: This user is not allowed access to the window system right now.
_RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
Dec 10 09:21:53 pb-d-128-141-37-96.cern.ch root[6613] <Warning>: CGSConnectionByID: 0 is not a valid connection ID.
Dec 10 09:21:53 pb-d-128-141-37-96.cern.ch root[6613] <Warning>: CGSConnectionByID: 0 is not a valid connection ID.
Dec 10 09:21:53 pb-d-128-141-37-96.cern.ch root[6613] <Warning>: CGSConnectionByID: 0 is not a valid connection ID.
Dec 10 09:21:53 pb-d-128-141-37-96.cern.ch root[6613] <Warning>: CGSConnectionByID: 0 is not a valid connection ID.
Dec 10 09:21:53 pb-d-128-141-37-96.cern.ch root[6613] <Warning>: CGSConnectionByID: 0 is not a valid connection ID.
Dec 10 09:21:53 pb-d-128-141-37-96.cern.ch root[6613] <Warning>: Invalid Connection ID 0
Dec 10 09:21:53 pb-d-128-141-37-96.cern.ch root[6613] <Warning>: CGSConnectionByID: 0 is not a valid connection ID.
Dec 10 09:21:53 pb-d-128-141-37-96.cern.ch root[6613] <Warning>: CGSConnectionByID: 0 is not a valid connection ID.
Dec 10 09:21:53 pb-d-128-141-37-96.cern.ch root[6613] <Warning>: CGSConnectionByID: 0 is not a valid connection ID.
Dec 10 09:21:53 pb-d-128-141-37-96.cern.ch root[6613] <Warning>: CGSConnectionByID: 0 is not a valid connection ID.
Dec 10 09:21:53 pb-d-128-141-37-96.cern.ch root[6613] <Warning>: CGSConnectionByID: 0 is not a valid connection ID.
   ------------------------------------------------------------
  | Welcome to ROOT 6.02/02                http://root.cern.ch |
  |                               (c) 1995-2014, The ROOT Team |
  | Built for macosx64                                         |
  | From tag v6-02-02, 26 November 2014                        |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
   ------------------------------------------------------------
 
root [0]

It does manage to start, but throws out even more warnings and errors than 10.9 does. :-/ But this makes it pretty clear that it's not the graphics system that causes the most serious issue here.

I'm happy to give further information about our nightly build machine. (I suspect that it doesn't have the very latest version of Xcode installed.) But I'm not sure exactly what information you guys are interested in.

I have to very much agree with Pere however that it's really not good if the ROOT 6 installation is so fragile. Should we expect that in the future local ROOT installations could break after an Xcode update? :-/

Cheers,
Attila

Comment by Pere Mato Vila [ 10/Dec/14 ]

Hi Attila. Yes this is problem. I can reproduce very easily. We didn't see it because all our tests, which are run without graphics session, runs "root.exe" and not "root". We will fix it ASAP meanwhile you can use root.exe instead.

Comment by Axel Naumann [ 15/Dec/14 ]

Hi,

regarding

fatal error: file
      '/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/c++/v1/__config'
      modified since it was first processed

I'm checking, this is independent of the CGS windowing issues.

Cheers, Axel.

Comment by Axel Naumann [ 15/Dec/14 ]

Hi,

The interpreter part should now be fixed in v6-02-00-patches and the master. Could you confirm, please?

It leaves the windowing issue; assigning to Olivier for that.

Cheers, Axel.

Comment by Attila Krasznahorkay [ 15/Dec/14 ]

Hi Axel,

Not sure how I should check this exactly unfortunately. The problem was that the build machine and the one on which I was trying to use the binaries, were not completely compatible. Is you fix resolving this issue? Because in that case I would need to ask you for some binaries. As local ROOT compilations never showed any of these issues for me. (As in that case we are using the exact same environment for the build and then the running.)

Or, if the fix is super great, could I try running binaries compiled on 10.9, on 10.10? (This I could test myself as well, it will just take a bit.)

Cheers,
Attila

Comment by Axel Naumann [ 15/Dec/14 ]

Hi Attila,

I do not expect the architecture mismatch to be solved by this, but only the error you mentioned in your first post ("file modified since it was first processed") which is different.

I will look at the MacOS 10.9 versus 10.10 build compatibility next - in a separate ticket, keeping this one focused on the windowing issue.

Axel.

Generated at Wed Sep 18 15:38:50 CEST 2019 using Jira 7.13.1#713001-sha1:5e06076c2d215a6f699b7e5c90ab2fae7ba5a1ce.