News
LDAP auth now also on the nodes
LDAP auth is now also functional on the nodes....next:
Either mount home usiong mount/fstab
OR
Figure out autofs + ldap....
Naming paradox
Here is the current approach:
From the point of view of the nodes, the Master node is always called "rootserver". This hostname is present all over configuration files (ldap.conf and others I'm not thinking of at the moment). What is also important to know about this rootserver hostname is that it's automatically generated using DHCP information at boot time and inserted into /etc/hosts/. This provides the following advantages:
- It's configurable
- It permits having a multihomed configuration (although some parallel processing tools don't support that, namely OpenMPI...last time I tried)
- It unifies the configuration file's reference to the master node as 'rootserver'
I found no other clean way of doing this, even the rootserver has an entry in /etc/hosts/ that points to itself.
Problem, this is making my ldap-auth overadapted to clustering, which means that anyone wanting to use it for regular auth setup will also have to know about putting rootserver in their /etc/hosts.
I don't see this solution as ideal but one has to move forward...
Nodes finally booting
Lots of little details have been killing my time in the past week, things like mount -i nfsvers=3 and putting NFSv4 aside for the moment since it required that the kernel be booted using an initrd, and I don't want to spend time in that area for the moment. So, what's new:
- Nodes now boot completely. This means I can start debugging all the other stuff I forgot to auto-configure on the master and nfsroot so they'll know each other exists (look at Issues for details)
- The node's kernel is now gentoo-sources because we switched to the AuFS module (from the sunrise overlay)
- ...and more stuff, check out the Repository for details
fsk.nfs.filename
While I was browsing around SoC projects I came across a name I knew OSCAR which is meant to ease the cluster building process for RPM based distributions. It so happens that Paul Greidanus's project is named NFS Mountpoints in OSCAR . With a smirk on my face, I contacted Paul stating we could probably exchange notes and pointed him to the current News link as well as the project page, he was quick to point out that either Gentoo is using a different /etc/exports file or I am using the wrong one, which I state to be /etc/exportfs.
...After blushing for a few minutes I corrected my config-generating code ;)
No wonder exportfs wouldn't complain of even say anything :/
livecd ~ # cat /etc/exports # /etc/exports: NFS file systems being exported. See exports(5). livecd ~ # cat /etc/exportfs /tftproot/nfsroot/x86_64 10.0.0.0/255.255.255.0(ro,no_root_squash,async,no_subtree_check) /home 10.0.0.0/255.255.255.0(rw,no_root_squash,async,no_subtree_check) livecd ~ # cat /etc/exportfs >> /etc/exports livecd ~ # exportfs -rav exporting 10.0.0.0/255.255.255.0:/tftproot/nfsroot/x86_64 exportfs: Warning: /tftproot/nfsroot/x86_64 does not support NFS export. exporting 10.0.0.0/255.255.255.0:/home exportfs: Warning: /home requires fsid= for NFS exportThose two errors are expected:
/tftproot/nfsroot/x86_64is AuFS mounted but I noticed AuFS wasn't built with NFSexport USE flag...changed/homeis the LiveCD's tmpfs mount, we have to add fsid=0 to the export args for that to work
...but this approach (fsid for tmpfs) might become useless since we're planing AuFS mounting the entire CD.
Back!
Well, been a while since I posted and a lot has been done in the past 2 weeks (more specifically). Everything snowballed since my first commit to the git repository... I won't repeat the details so read the gentoo-soc mailing list and the commit logs if you're really bored.
Current state of affairs:NFSroot (node reference image)
Boots up without any problems (thanks to Roy marples for patching up OpenRC blazingly fast)- LDAP hasn't been tested/auto-configured for the nodes just yet, this requires the cluster-setup script (on the LiveDVD) to be completed (see below)
- Shutdown is still messy as per bug 98
LiveCD/DVD
I am currently unable to exportfs NFS shares. The following has been checked:/etc/exportfssyntaxexportfs -vrareturns noting- there are no tcpwrapper blockage (
/etc/hosts.{allow,deny}don't even exist) - only error I ever got about NFS is :
/var/lib/nfs/state: bad file size, setting state = 1...and I checked that the file is rw withtouch /var/lib/nfs/state - Here is the file if anyone has a clue:
livecd ~ # cat /etc/exportfs /tftproot/nfsroot/x86_64 10.0.0.0/255.255.255.0(ro,no_root_squash,async,no_subtree_check) /home 10.0.0.0/255.255.255.0(rw,no_root_squash,async,no_subtree_check) livecd ~ # exportfs -rav...Ok, let's call it a DVD. I'm basically piggybacking releng's hard work with the following exceptions:
- GCC 4.3.1, this also implies ~sys-libs/glibc-2.7 ...so that's a nice build from stage[1-3] then livecd-stage[12]
- aufs is used on the CD to make /tftproot RW, it would otherwise be impossible to make a dynamically configurable boot CD (YES, I am probably going to replace the tmpfs-only liveCD approach to a global AUFS mounted CD root...something a few people have been asking for on the media)
- Obviously, the liveCD contiains more sci-specific apps then the regular CD (and I might have removed space hogs from the official CD...do a diff on the spec files if you really want to know). Useful additions taken from the spec file (apart from the NFS root):
## Kyron:
# explicitly adding net-nds/openldap so it
# gets rebuilt with -minimal (can't do that in
# stage3 at the moment.
net-nds/openldap
app-portage/portage-utils
net-nds/ldap-auth
sys-cluster/beowulf-head
sys-cluster/openmpi
app-admin/eselect-cblas
app-admin/eselect-blas
app-admin/eselect-lapack
# Added growmacs as per Alexey Shvetsov's request ;)
# from: Bug 193532
sci-chemistry/gromacs
# man needs this:
app-arch/lzma-utils
Weekly progress report [june 9-15/16]
LDAP:
I spent many hours (way over the 30 hours I had promised myself to pass/week on SoC) creating an LDAP-as-auth-backend auto-install script. It's not simple because Gentoo's philosophy is that ebuilds do as little as possible and the admin does the work. I have no problems with this approach but it's, by definition, countering my efforts of providing a "turn-key" solution Clustering LiveCD. Although most of the work that is being done by the script should be done by an ebuild, I had to chose a stand alone script beacuse:
1- I absolutely have to modify/create some files in /etc
2- Once some of the files created, I have to initiate the ldap database
3- Then successfully start the slapd daemon
4- and only then shall I finish the /etc file modifications (ie: changing /etc/nsswitch.conf to also use ldap as a backend)
Obviously, since this script is supposed to be called from within the catalyst process, Joe user should not have to use it but my intention is that the script could also be used later on for people wishing to implement LDAP without having to learn all that is required to get that going on their system (obviously with a BFW: "This is a one shot deal, don't expect it to work, you should read the docs, it's poison, it will reformat your car's carburator, etc..." I'm also leaving in the possiblity that the same script + config file approach could be used to add LDAP databases in the future (such as a shared Addressbook)
Well, even though all of this seems far from clustering and HPC, the whole central auth and management is an issue when it comes to a cluster. One has to remember that a cluster is like a department isolated on it's own network and everyone is supposed to be able to log onto any machine and expect them to all behave the exact same way.
Stuff that would be nice to also have in LDAP which isn't presently part of my script/template:
- Automount defined within the LDAP dir
- TLS secure backend
- implies auto-generating self-certs...and LDAP is very evil with that
- Find a nice user friendly GUI (lat is quite unstable and luma simply fails for some obscure reason)
- Due to ^^ add the automation behind configuring diradm
Catalyst:
I updated the spec files to use a new snapshot since I will want to be using net-nds/openldap-2.4.10 and it's quite recent in the tree. In the process I noticed I could get to Stage3 with no problems but that liveCD-stage1.spec now completely barfs with a huge list of loop dependency errors. I backtracked to the original snapshot and the errors are also there. I'll have to investigate by removing my profile overlay, it's probably due to some change I did in there and didn't rebuild the liveCD since. It's not critical for the moment so I'll set that aside for the time being (adding a bug ton soc.gexp.o)
Clustering:
Jsbronder's on fire, I'll definately have to look into his empi and eselect mpi work, being more than just relevant to clustering ;)
Special thanks:
robbat2: for all his help and patience with my obvious n00bism concerning LDAP ACLs and some config directives ;)
Damm (#ldap): Has helped me with a few questions and made me waste much time on nssov...which I thwarted him into trying to create an ebuild now :P
VM Dev environmrnt up and running!
Well, I now have a dev environment unders VMWare, I'll therefore be able to start development without screwing up my system (especially since there will be much ebuild creation involved).
I hit a snag that will bite back when I will want to test the cluster node images: for some reason, my VM won't netboot (PXE), DHCPACK is never sent...the same machine did boot off the CD and did get an IP address via the same DHCP mechanism. This will have to be investigated of course...
Alea iacta est
Well, I'm officially starting work on this project, even though I've been working on-and-off on all of this for the past 3 years ;)
What I've done in the past 3 weeks:
- Upped server to: Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz
- Now have 8Gigs of 800MHz RAM (running at 887MHz although lshw tells me it's at 667...)
- played around with catalyst enough to have built my own LiveCd which is entirely built using GCC-4.3, which I find quite COOL!
I've decided I would keep some notes here since I can't ever seem to decide which blogging spot is the best (that and I hate blogging).
Also available in: Atom