2

I have bitten way more than I could chew.

Recently purchased a 80 diskless PC Internet/Gaming Cafe. The running system was cracked CCBoot, which at first seemed like caused random hangs, which apparently is not cracked CCBoot's fault, even though they state that cracked licenses cause random blue screens.

I need guidance and general tips setting up a Windows Server 2012 R2, iSCSI Target Server to boot and run 80 diskless PC's

After 3 sleepless nights + 7 days of general monitoring and reconfiguring, linux server, messing around with gPXE || iPXE, learning its script, numerous google searches, DHCP server configs, bits and pieces from all over the net, and god knows how many fresh format installs of Ubuntu, windows server 2008, windows server 2012, windows server 2012 R2 combined the current specs as follows:

Specs

Software

Server:

  • Windows Server 2012 R2
  • iSCSI Target Server
  • 1 parent VHDx
  • 80 child differencing VHDx's
  • 80 iSCSI targets for each VHDx
  • MPIO feature /checked iSCSI checkbox/
  • DHCP Server Role
  • Hyper-V Role /without virtual ethernet adapter set up/
  • SolarWinds TFTP Server
  • iPXE custom scripted image

Client:

  • Windows 7 x32 /not sysprep-ed/
  • 1gbps established LAN
  • few games

iSCSI setup:

  • 1 master VHDx
  • 4 parent differencing VHDx's spawned from master booted and auto-driver installed on 4 different HW-ed PC's
  • 80 child differencing VHDx's spawned from the 4 parents
  • master and 4 parents located on SSD
  • all children located on soft RAID-ed /storage pooled/ volume

Tech

Server:

  • 1x120GB SSD
  • 1x500GB SSD
  • 3x500GB HDD /Storage Pooled together to 1.36 TB soft RAID0/
  • 1gpbs LAN card
  • i7 4770

Clients:

  • 1gbps LAN card /not sure if capable of 10gpbs/
  • 2 or 3 different vendor but same GPU and model Video Cards, nVidia GTS 450
  • AsRock P55M Pro
  • 2x 2048MB SamSung RAM

Network:

  • 1gbps working connection
  • 1xCisco router /100mbps LAN ports, used only for WAN access/
  • a DVR system
  • 3z DGS-1024D cascading switches

Setting up the software side of this setup, with some software engineering background, embarrassingly enough took me 3 days.

Current problems:

  • PC's just freeze completely but rarely on startup
  • Running PC's randomly freeze that require hard reset, assuming HW problem on client side, have not checked or isolated. assumption reason: issue was present with previous CCBoot setup.
  • The whole system only tested by 10 people sitting on 10 client PC's simultaneously while more than half were turned off completely, all the problems showed up all over the running clients.

Questions:

  • would the DVR system affect the network?
  • Does Windows 7 make more disk i/o requests than Windows XP? If so I'm more than willing to switch WinXP on the master VHDx.
  • iSCSI target server tweaking necessary
  • is there a particular key word for choosing a switch for this network, I believe DGS-1024D is losing a lot of packets.

I've only set up test child VHDx's, wrote a script to remove, generate, and connect child VHDx's to targets and ran finally ran them on client PC's in the last day, I was hoping the system would just work, because all the other time I've spent was setting up the Server and configuring DHCP and iPXE, and of course formatting. Every step for me required a different OS for the server PC. keep in mind in all this there was only 1 SATA DVD-ROM, and 1 4GB flash disk.

and in general please slap me, chew me out, as long as you give me the right hints. very desperate.

UPDATE: After preparing different image for different hw machines, the boot ups have become normal. Recently the person who set up the network and who is also in charge of the WAN came by to reset the router, also reconnected the server to the router when I asked why there was a lan line that was connected to the server's nic "disappearing" somewhere rather being connected to a nearby switch or at least the router itself /closest thing to the server/ he didn't know. That was 2 or 3 days ago, looking through task manager and generally monitoring I discovered the router was not 1gbps! After reconnecting the server using the old 1gbps connection which is apparently on the other end of the cascading switches, fixed blue screens /which was few hours ago I haven't really stressed the whole system, only running 10 pc's at the same time/ Out of the 10 pc's running 1 experienced a freeze or a blue screen, my assumption now is the Mobo or RAM or overheating is causing the freezes, tomorrow without starting any other PC's /without network load/ I'm to stress test a single PC that last froze will freeze again. If it does then it is definitely not a network nor iSCSI issue. All in all, when receiving a problematic system issue is to not to believe anyone and assume anything and everything is wrong and start with the basics. And certainly "if it ain't broke, don't fix it"

P.S. at 1 point I thought some of the PC's had RAM issues, believing only few PC's were experiencing issues. So, I took the most problematic PC and ran a memtest on it from USB, by the time I returned to check up on it showed up 19 thousand errors and froze. I automatically assumed all of the PC's had RAM issues and believed the all the RAM's had to be checked; the day later I took 1 PC /believed to have no HW issues/ and ran memtest on it 1 pass. No issues, so I took the "problematic" pc's RAMs 1 by 1 and inserted them on this "test" PC and ran memtest; to my surprise no errors! I was furious and the same time so lost.

Right at this moment I believe one or more of the following are causing issues: client motherboards AsRock P55M Pro, overheating and not so common dirty heatsinks of GPU, and/or DGS-1024D switches /highly doubtful/.

UPDATE 2. If anyone is reading. Did the stress test, crashed on GPU stress using FurMark that almost bricked the board. But highly unlikely that is the reason of all the crashes, because most of time the crashes were random and not on high GPU usage. But because I almost bricked the board /lan boot was no longer working/ I removed the PC and plugged it in a different place to run some tests and see how much dust had accumulated then ran memtest on it. The lan boot magically fixed itself /I guess it needed a complete power drain/. Memtest failed at the end with 100k errors so I ran memtest separately on individual sticks and the test passed, then to reproduce the failure ran memtest on both sticks simultaneously passed again. There seems to be an issue with the board running on these Samsung RAM sticks, I believe I narrowed the issue down to RAM, once I can find the right configurations for these RAM sticks /i.e mhz and voltage settings/ I can move on and test the iSCSI system, which after the 100mbps -> 1gbps discovery runs fine when using ~10 PC's .

PS stay away from AsRock and Samsung RAM, but more from AsRock

In the end if I ever stabilize the system I will definitely write a guide to set this system up.

ochitos
  • 129
  • 1
  • 5
  • 6
    `very desperate.` Then you should probably a consultant and pay for some expertise. And to set this up properly from the get-go, which it definitely is not now. – HopelessN00b Oct 31 '14 at 18:37
  • Completely agree, but i really want this to be a learning experience – ochitos Oct 31 '14 at 18:43
  • @fuximusfoe Then start reading the documentation. We here do not provide free training. – TomTom Nov 06 '14 at 13:53
  • I wasn't really hoping for a free training or step by step walkthrough, just hoping for some pointing in the right direction. – ochitos Nov 06 '14 at 14:04
  • most problems have disappeared now, though there this random freeze or hanging even when network not fully utilized I'm assuming problem with the Switches 3x DGS-1024D, are not up to the task – ochitos Nov 06 '14 at 14:13

0 Answers0