您好,欢迎来到客趣旅游网。
搜索
您的当前位置:首页p83

p83

来源:客趣旅游网
五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

ChangeIsHard:AdaptingDependencyGraphModelsFor

UnifiedDiagnosisinWired/WirelessNetworks

LeninRavindranath†,ParamvirBahl‡,RanveerChandra‡,

DavidA.Maltz‡,JitendraPadhye‡,ParveenPatel‡

MIT,‡MicrosoftResearch

ABSTRACT

Organizationsworld-wideareadoptingwirelessnetworksatanim-pressiverate,andanewindustryhassprunguptoprovidetoolstomanagethesenetworks.Unfortunately,thesetoolsdonotintegratecleanlywithtraditionalwirednetworkmanagementtools,leadingtounsolvedproblemsandfrustrationamongtheITstaff.Weex-ploretheproblemofunifyingwirelessandwirednetworkmanage-mentandshowthatsimplemergingoftoolsandstrategies,and/ortheirtrivialextensionfromonedomaintoanotherdoesnotwork.Buildingonpreviousresearchonnetworkservicedependencyex-traction,faultdiagnosis,andwirelessnetworkmanagement,weintroduceMnM,anend-to-endnetworkmanagementsystemthatunifieswiredandwirelessnetworkmanagement.MnMtreatsthephysicallocationofenddevicesasacorecomponentofitsmanage-mentstrategy.Italsodynamicallyadaptstothefrequenttopologychangesbroughtaboutbyend-nodemobility.WehaveaprototypedeploymentinalargeorganizationthatshowsthatMnM’sroot-causeanalysisengineout-performssystemsthatdonottakeusermobilityintoaccountwhenlocalizingfaultsorattributingblame.CategoriesandSubjectDescriptorsC.4[Performanceofsys-tems]

GeneralTerms:Management,performance,reliability,wirelessKeywords:Wireless,corporatenetworks,performance

1.INTRODUCTION

DatafromITdepartmentsoflargecorporationsanddominantPCmanufacturersshowthatemployeesprefertousejustonedevice(e.g.,alaptopcomputer)foralltheircomputingneeds[17].Con-sequently,manylargeITdepartmentsaremovingtowardsafuturethatincludesasignificantlyreducedroleforthetraditionalwireddesktopcomputer[11].Theyenvisionafuturewhereenterprisesdeploywirelessnetworksinallcorporatecampusbuildings,andswarmsofnomadicusersaccesscorporateresourcesthroughwire-lessAccessPoint(APs).Theyexpectuserstofrequentlychangetheirpointofattachmenttothecorporatenetwork.Inthisnewworld,thecorporateITdepartmentsneedtoolstomanageanddi-agnosebothwiredandwirelesspartsoftheirnetwork.

Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecificpermissionand/orafee.

WREN’09,August21,2009,Barcelona,Spain.

Copyright2009ACM978-1-60558-443-0/09/08...$5.00.

Currententerprisenetworkmanagementanddiagnosissystemsuseseparatetoolstodiagnosewiredandwirelessnetworks.Inanenvironmentwherealargenumberofusersarenomadic,debug-gingapplicationperformanceproblemsusingseparatetoolsisbothdifficultandfrustrating[13].

Forexample,considerFigure1thatshowsthetimerequiredtofetchaURL,measuredsimultaneouslyfromawireddesktophostandawirelesslaptopasthelaptopwasmovedbetweenroomsev-ery5minutes.Unsurprisingly,boththewiredandwirelesshostseesignificantvariationintheresponsetime.Interestingly,however,thevariationissometimesseenbythewirelesshostonly,poten-tiallyindicatingproblemsinthewirelessconnectivity,andsome-timesthevariationisseenonlyinthewiredhost,potentiallyindi-catingcongestioninthewirednetwork.Sometimesthevariationisseeninboth,potentiallyindicatingcongestioninaserverinvolvedinprovidingtherequestedURL.

Anaturalquestiontoaskis:whynotdiagnoseperformanceproblemsbyusingtheexistingwirelessandwirednetworkdiag-nosissystemsseparately?

Theansweristhatadiagnosissystemthatlooksatonlythewirednetworkorthewirelessnetworkislikelytomisinterpretsomeofthespikesintheresponsetimeandblamethewrongnetworkcom-ponent.Inthispaper,weshowthatqualityofdiagnosisisbetterwhenbothwiredandwirelessaspectsoftheenterprisenetworksareanalyzedjointly.

Threemainfeaturesdistinguishourapproachfromtherecentresearchonenterprisenetworkdiagnosissystems:

Changingnetworktopology:ManyrecentlyproposednetworkfaultdiagnosissystemssuchasSherlock[4]andSMARTS[22]implicitlyassumethatthefundamentalstructureofthenetworkiseitherstaticorchangesslowly.Thisassumptionallowsthesesys-temstobuildInferenceGraphs[4]andcodebooks[22]topinpointthecauseofperformanceproblemsseenbytheusers.However,theseapproachescannotbeusedwithoutsubstantialmodificationsinanenvironmentwhereclientsfrequentlychangetheirpointofattachmenttothecorporatenetwork.

JointConsiderationofWiredandWirelessNetworks:Todi-agnoseend-to-endperformanceofnetworkedapplicationsacrosswiredandwirelessnetworksrequiresre-thinkingcoreaspectsoffaultdiagnosis.Forexample,geographiclocationmustbecomeafirstclassobjectintheanalysisfordeterminingifaproblemisinthebackhaulnetwork,thewirelesslink,orthedatacenterservers.AbsenceofFixedObservers:Sincemanyproblemsinwirelessnetworksarelocationspecific,existingwirelessnetworkmonitor-ingsystemsrelyonfixeddesktops[8]orspecializedmonitoringhardware[3,10].However,inanetworkconsistingprimarilyofnomadicusers,systemslikeDAIR[8]areimpractical,whilesys-temslikeJigsaw[10]andWit[18]areexpensivetodeploy.

83五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

Largeservervariations

Largewirelessvarations

BackgroundwirelessvariabilitySpikesofvariabilityinserver

Figure1:TimetofetchaURLasmeasuredsimultaneouslyfromawireddesktophostandawirelesslaptop.Thelaptopwasmovedbetweenroomsevery5minutes.

Wehavedevelopedanend-to-endnetworkdiagnosissystem,calledMnM,thatsuccessfullydiagnosesperformanceofnetworkedser-vicesandapplicationsrunningonnomadichosts.MnMbuildsonrecentresearchonnetworkservicedependencyextraction[4],faultdiagnosis,andwirelessnetworkmonitoring.Ittreatsthephysicallocationofenddevicesasacorecomponentofitsdiagnosisstrat-egy.Italsodynamicallyadaptstothefrequenttopologychangesbroughtaboutbyend-nodemovement.Oursystemisimplementedentirelyinuser-levelsoftware,anditdoesnotrequireanyspecial-izedmonitoringhardware.WehavedeployedtheMnMsystemonasegmentofourorganization’snetwork.Overaperiodoftwoweeks,wemonitored27usersand10servers.Wedetectedandcorrectlydiagnosedavarietyofperformanceissues,includingpoorWi-Ficoverage,congestioninwirednetworks,andmisconfig-uredDNSentries.Asweshallshowlaterinthepaper,atleast140performanceproblemswouldhavebeenmis-diagnosedhadwenottakenanintegrated,holisticviewofwiredandwirelessnetworks.MnMextendsthestate-of-artinenterprisenetworkmanagementbymakingtwoimportantcontributions:

1.Weidentifyissuesthataenterprisenetworkmanagementsystemmustconsiderwhentheend-hostsarenomadic.Weshowthatrecentlydevelopedsystemsarenotabletocopewiththeseissues.Wequantifymistakendiagnosesthatoccurinsystemsthatdonotcompensateforusernomadicity,andwearguethatlocationmustbetreatedasacorecomponentinfutureenterprisenetworkmanagementsystems.2.Wepresentanenterprisenetworkmanagementsystemthatuni-fieswiredandwirelessnetworkmanagement,andhandlesno-madicusers.Itiseasytodeploy,asitrequiresnospecialfixedinfrastructureforwirelessmonitoringandautomaticallyinitial-izesitslocationsystem.Weevaluateitsaccuracythroughbothcontrolledexperimentsanda2-weekfieldstudy.

clientsandwirelessAPs.Unlikeoursystem,theirtechniquesmissoutonproblemsthatamobileclientmayhavebecauseofaperfor-manceissueinthewiredpartofthenetwork.

TheDAIRsystem[8]alsodetectsperformanceproblemsfacedbyusersofWi-Finetworks.DAIRusescorporatedesktopcomput-erstomonitortheairwavesand,likeMnM,location-awarenessisacorecomponentofitsmanagementstrategy.Fundamentally,DAIRreliesontheexistenceoffixeddesktopdevicestomonitorperfor-manceofwirelesslink.Incontrast,MnMassumesaworldwhereeveryclientismobile.Insuchanenvironment,monitoringmustbedonebymobileclientsthemselves.Thispresentsseveraluniquechallenges,suchasbootstrapping,whichsystemslikeDAIRcan-nothandle.Furthermore,DAIRrequiresthemonitoringdevicestosniffpacketsinpromiscuousmode,whichmaynotalwaysbepossibleonbatteryconstrainedmobileclients.

Jigsaw[10]andWIT[18]areWi-Fimonitoringsystemsthatcombinethedatafrommultiplemonitorstogenerateacomprehen-siveviewofnetworkevents.Jigsawusesdedicated,custom-built,multi-radiomonitoringnodesandprovidesadetailedviewoflow-levelnetworkeffectssuchasinterference.WITisabletoanalyzeanddetectMAC-levelmis-behavior.Whileusefulininvestigatingwhyindividuallocationshavepoorperformance,thesetoolsarenotdesignedfordiagnosingend-to-endnetworkedservicesinacorpo-rateenvironment.

Commercialsystems[2,3]areavailableformanagingwirelessnetworks,buttheydonotdetectperformanceissuesduetoprob-lemsinthewiredpartofthenetwork.Furthermore,systemslikeDAIR,Jigsaw,WIT,Airtight,etc.donothavevisibilityintoapp-lication-levelperformanceproblems,whereas,aswewillshow,MnMdoes.

WiredNetworkManagement:TheSherlocksystem[4]managesnetworkedservicesinenterprisenetworksbyextractinginferencegraphsandthenusingthesetodiagnoseperformanceproblems.SoftwareagentsrunningondesktopmachinesdeterminethesetofservicesthehostdependsonandacentralizedinferenceenginecapturesthedependenciesbetweenthecomponentsoftheITin-frastructurebymergingtheviewsofeachclient.Sherlockthendi-agnosesfaultsbyrunninganinferencealgorithmontheinferencegraphs.Sherlockmakesafundamentalassumptionthatdependen-ciesarestaticor,atmost,changeslowly.Thisisnottrueforap-plicationsrunningondevicesusedbynomadicusers.AsweshowinSection3,systemslikeSherlockperformpoorlywhendepen-denciesaredynamicandfastchanging.Furthermore,suchsystemscannotbetriviallyextendedtohandlenomadicclients.

Othernetworkmanagementsystems,suchasShrink[14]andSCORE[15],havemadeseminalcontributionsindiagnosingfaultsinwide-areanetworks,buttheycannotbeeasilyusedformanagingnomadicusers.Similarly,sophisticatedcommercialproductssuchasSMARTS[22],OpenView[19],andTivoli[23]providepow-erfultoolsformanagingenterprisewirednetworks,butfallshortwhenextendedtomanagemobileclientsandWi-Fiusers.

Finally,wenotethatalongerversionofthispaperisavailableasatechnicalreport[5].

2.RELATEDWORK

Thereissubstaintialpriorworkinenterprisenetworkmanage-ment.However,ithasfocusedonmanagingeitherwirednetworksorwirelessnetworks,notbothsimultaneously.Theclosestthingtounifiedmanagementtoolsaresystemsthatletnetworkmanagersviewthewiredandwirelessnetworkssimultaneously[13].WirelessNetworkManagement:Adyaet,al.[1]builtoneofthefirstenterprisewirelessnetworkmanagementsystems.Theirsystemissimilartooursinthattheyfocusonperformanceprob-lemsfacedbyWi-Fienabledmobileclients.Theydetectproblemsbyanalyzinglinkdatacollectedbymonitoringagentsresidingon

3.FORMULATINGTHEPROBLEM

Figure2illustratesanenterprisenetworkofthefuture.Userslocatedonthecorporatecampusaccesstheenterprisedatacen-terserversviaAPsdeployedincampusbuildings,andtheseusersmovearoundfrequently.Someusersmayworkremotely,andcon-necttothecorporatenetworkviaVPN.Inthispaperwefocuspri-marilyonnomadicuserswhochangelocationbutconductmostoftheirworkwhenstationary.Someotherpapersrefertotheseasmo-

84五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

Figure2:Exampleofthetypicalenterprisenetworkofthefu-ture.Mostusersaccesscorporateresourcesfromlaptopcom-putersconnectedtowirelessnetworksorfromremotelocationsviaVPNsovertheInternet.

Cusingawebserver.Inthisfigure,theresponsetimetheclientCobserveswhenfetchingawebpagewillbeaffectedbythehealthoftheDNSservice,theKerberosservice,andthewebserveritself,sincetosuccessfullyfetchthewebpage,CmustfirstuseDNStoconvertthenameofthewebsitetoanIPaddress,thenfetchcer-tificatestoaccessthewebsite,andfinallyretrievethecontentfromthewebsite.Thehealthoftheseservices,inturn,isaffectedbythehealthoftheserversthatimplementtheserviceandtheabilityoftheclientCtosuccessfullyreachtheserversoverthenetwork.Thehealthofeachnetworkpathisaffectedbytheroutersonthepath.NodesintheInferenceGraphareconceptuallyinoneoftwostates:upordown.Rootcausesthatareoperatingnormallyandobservationsindicatingnormalperformanceareup.Nodescausingorindicatingpoorperformancearedown,eveniftheyhavenotfailedcompletelybutaremerelyslowinreturninganswers.

WhileourexampleInferenceGraphhasonlyasingleclientandasingleobservationofasingleapplication,asystem-wideInferenceGraphisbuiltbycombiningthegraphsforeachclientapplicationandservice.Thesegraphssharethesamerootcausenodes,buthavedifferentobservationandservicenodesforthecombinationofeachclientandapplication.

TheInferenceAlgorithm:Giventheinferencegraphandthestateoftheobservationnodes,aninferencealgorithmcaninferwhichrootcausesaremostlikelytohavefailed.Thisisespeciallyusefulinthecaseswhererootcausescannotbedirectlyobserved[4,15].Manyinferencealgorithmshavebeendeveloped,butthegoalofeachisthesame:givenasetofobservationsofsystemperfor-mance,goodandbad,determineasetofrootcauseswhosefailurewouldbestexplainthatpatternofobservations.Tocopewiththeuncertaintyintherealworld,MnMusesprobabilisticinference.Specifically,everyrootcausehasapriorprobability—thatis,thefractionoftimetherootcauseistypicallydown.Theinferenceal-gorithmtakesthesepriorsintoaccountwhencomputingwhichrootcausesaremostlikelytobedown.ThealgorithmusedinthispaperisthesameasthatusedbySherlock[4].

Figure3:ExampleInferenceGraph.Theresponsetimemea-suredforfetchinghttp://foo(dashedoutline)isaffectedbytherootcauses(shownwithdottedoutlines).

bileusers,andweusethetermsinterchangeably.WebelieveMnMisapplicabletousersinconstantmotion,butitisoutofthescopeofthispaper.

3.2ImpactofNomadicUsers

3.1FaultDiagnosisusingInferenceGraphs

Priorworkinfieldsasdiverseasnetworkmanagement[15,4,24]andmedicaldiagnosishasshowntheadvantagesofusinganInferenceGraphtodiagnosefaultsinthepresenceofnoisyobser-vations.However,wehavefoundthatnomadicusersviolatesomeoftheimportantassumptionsonwhichthesesystemsarebased,and,consequently,thesesystemsperformpoorlywhenusedtodi-agnosetheproblemsexperiencedbynomadicdevices.

MnMattemptstoleveragetheexpressivenessofinferencegraphswhilefixingtheproblemsthatpreventthemfromusewithnomadicsystems.Webeginwithabriefoverviewofinferencegraphs.Formoredetails,see[4,15].Then,inSection3.2,wedescribetheproblemscausedbynomadicusers.Section4describesourtech-niquesforapplyingInferenceGraphstonomadichosts.

TheInferenceGraph:

WeusethemodelproposedinSherlock[4].AnInferenceGraphconsistsofdirectededgesandthreetypesofnodes:rootcauses,meta-nodes,andobservations.Thegraphencodeshowrootcauses,whichrepresentcomponentsorservicesthatcanbefaulty,affecttheobservationnodes,whichrepresentaspectsofthesystemthatcanbemeasured.Meta-nodesarethegluethattiestogethertherootcausesinvolvedinparticularservicesornetworkpaths.

Figure3illustratesanexampleInferenceGraphforasingleclient

Onecouldaskthequestion,wouldatrivialcombinationofwire-lessmonitoringmethods[8,10,18]andwiredmonitoringmeth-ods[4]beabletodiagnosetheproblemsexperiencebynomadicusers?Weanswerthisquestionbymakingthefollowingfourob-servations:

3.2.1DynamicInferenceGraph

Adefiningcharacteristicofnomadicusersisthattheymove,changingtheirlocationandtheirpoint-andmethod-of-attachmenttothenetworkuptoseveraltimesduringaday.Asaresult,Infer-enceGraphsfornomadicuserschangefrequentlyandsignificantly.Forexample,whenanomadicuserconnectstotheenterprisenet-workviaawirelessnetwork,theAPchangesasshemovesfromonelocationtoanother.Worseyet,theserversinotherpartsoftheInferenceGraphchangeaswell,astheDNSandKerberosserversthatahostusesmaychangewheneverthesubnetchangesandanewIPaddressisissuedfromtheDHCPserver.Figure4illustrateshowtheInferenceGraphforaparticularapplicationchangedcomparedtotheinferencegraphofFigure3asclientC’spointofattachmentchangedfromawirednetworktoawirelessnetworkatadifferentlocation.

Suchdynamisminsidethenetworkisaproblemforcurrentin-ferencesystems.PriorworkhasproposedtechniquesforlearningtheInferenceGraphviamonitoringthepacketsthathostssendandreceive[20,4].However,theselearningalgorithmsassumethat

85五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

3.2.4DifficultiesIdentifyingRootCauses

Figure4:ExampleInferenceGraphwhenanomadicusercon-nectstothethecorporatenetworkusinga802.11wirelessnet-work.ToeasecomparisonwithFigure3,nodesaffectedbymo-bilityareshownwithdarkbackgrounds.

Onemightarguethatrunningexistingwirelessandwireddiag-nostictoolsseparatelycandiagnoseapplication-levelperformanceproblemsfornomadicusers.However,lowlevelwirelessperfor-mancemetricssuchassignalstrengthandpacketlossrateshaveacomplexrelationshiptotheperformanceofhigherlayers[9].Onecannotsimplyassignthresholdstotranslatelink-layermea-surementsintoapplication-levelthroughputs.Forexample,usingthedatacollectedfromour2-weekstudypresentedinSection6.2,weseethatthereisnosignificantcorrelationbetweentheAPsig-nalstrengthseenbyaclientandtheend-to-endperformanceitachieves.Further,therearesomedependenciesinthewirednet-workthatarespecifictowirelessmachines,e.g.APs,thewire-lessgatewayandthewirelessauthenticationservers.Itishardtomeasuretheirimpactonapplicationperformancewithoutunifyingwiredandwirelessperformancediagnosis.

4.

theInferenceGraphremainsunchangedlongenoughtobelearned.Forexample,SherlockreportsthatittakesseveralhoursforthelearnedInferenceGraphtostabilize.Otherresearchershaveshownthatuserschangelocationfrequently[7,16],soformostcasestheSherlockalgorithmwouldnotbeabletolearntheInferenceGraphbeforeitchanged.

MnM’sapproachistoseparatetheInferenceGraphintothepor-tionswhicharerelativelystaticandcanbelearned(e.g.,depen-denciesamongserversinthewireddatacenter)andtheportionsthatchangefrequently.WeusetheDomainExpertsdescribedinSection4.1.4tocomputetheseportionsasneeded.

ARCHITECTURE

3.2.2ImportanceofLocation

Researchershavepreviouslyshownthatthephysicallocationofamobiledevicehasadirectimpactontheperformanceoftheap-plicationsitisrunning[8,12].Forexample,twousersrunningthesameapplication,connectedtothenetworkviathesameAP,mayexperiencedifferentperformance—onemightseeshortresponsetimesfromawebserverwhiletheotherseeslongresponsetimes,allduetovariationsintheRFenvironmentaroundtheirphysicallocation.IflocationisnotincorporatedintotheInferenceGraph,thentheinferencealgorithmwillblamethewrongrootcauseasittriestoexplaintheperformanceproblemsseenbythehostexperi-encinglongerdelays.Consequently,MnMtreatsphysicallocationasacorecomponentofitsend-to-endnetworkdiagnosissystem.

Asystemthatjointlymanageswiredandwirelessnetworksneedsthreeuniquecapabilities:anabilitytodeterminethelocationsofmobileclientswithoutrelyingonfixedmonitoringresources,anabilitytofrequentlyupdatetheinferencegraphandanabilitytode-terminetheperformanceofdifferentcomponentsofthenetwork.Inadditiontoend-to-endobservations,MnMalsomeasurestheper-formanceofsomeindividualnetworkcomponents,suchastheca-pacityofthewirelesslink,andincludestheseintoitsinferenceal-gorithmwhendiagnosingapplication-levelperformanceproblems.InthissectionwedescribethearchitectureofMnMandshowhowthesecapabilitiesareincorporatedwithinit.

Figure5illustratesMnM’sarchitecture.MnMconsistsoftwomaincomponents:theMnMAgentthatrunsoneachmobiledeviceinthenetwork,andtheMnMInferenceEnginethatacceptsdatafromtheseagents.TheInferenceEngineanalyzesdatafromagentstodeterminetherootcauseofperformanceproblems,andraisesalertstothenetworkoperator.Inaddition,wehavedomainexperts,whosefunctionalityissplitbetweentheagentandtheinferenceengine.Theroleofdomainexpertsistomodifytheinferencegraphinsomespecialcases.Belowweprovidemoredetailsoneachofthesecomponents.

CommentaboutPrivacy:Thispaperfocusesonenterprisenet-works.InsuchnetworkstheITdepartmenthastheauthoritytorequireeveryusertorunmonitoringsoftware.Therefore,theis-suesofuserconsentandprivacyareoutofscope.

3.2.3DynamicsofMonitoringanditsLimitations

4.1TheMnMAgent

StateoftheartWi-Finetworkmanagementanddiagnosissys-temssuchasJigsaw[10],WIT[18],andDAIR[8]relyontheexistenceoffixedinfrastructure,eitherintheformofspecializedhardwareoralways-availabledesktopcomputers,tomonitortheRFenvironment.Specializedhardwareisexpensivetodeployandmaintain.Furthermore,thegeneraltrendinlargeITdepartmentsistoreplacedesktopcomputerswithlaptops.Withoutthesupportof‘static’infrastructure,determiningthephysicallocationofaclientbecomesdifficult.Further,thelaptopsofordinaryuserscannotbeusedtotakedetailedmeasurementsoftheirwirelessenvironmentbecausethatwouldrequirerunningtheirWi-Fiinterfacecardsinpromiscuousmode.Promiscuousmodepreventsthecardsfromen-teringtheirpowersavestatesandthusplacesanunacceptablestrainonthelaptops’batteriesandincreasesthebarriertodeployment.Consequently,end-to-endnetworkdiagnosissystemsmustuselight-weightself-configuringlocationdeterminationtechniquesthatdonotdependonsupportfromexistinginfrastructure.

TheMnMagentisalight-weightapplicationthatrunsonusers’laptops.ItincludesMonitorsthatgatherinformationaboutthesys-tem,useractivityandnetworkconnectivity.ThisdataisprocessedbyDomainExpertsthatencapsulatethespeciallogicrequiredtodealwithdifferentproblemdomains.TheDomainExpertsgen-eratedatafortheinferencegraphandperformanceobservations.TheagentsendsallthisdatatotheMnMInferenceEngineoveratransportcalledtheTrickleIntegratorthatisdesignedtocopewithintermittentandvariableconnectivity.TheMnMAgentdoesnotrequireanydrivermodificationsintheclientsandhenceiseasytodeploy.

4.1.1(Agent)Controller

TheControlleristheagent’slightweightworkflowengine.Itprovidesapublisher-subscriberservicetomoderatetheinteractionsbetweenMonitors,DomainExperts,andtheTrickleIntegrator.AllmessagesbetweenthecomponentsinMnMtaketheformoftuples:alistoffieldsandtheirvalues.Theexpertsandmonitorsregister

86五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

AgentDomainExpertsWifiRASHTTP......TrickleIntegratorInferenceEngineDomainExpertsWifiRASHTTP...Measure-mentsInferenceGraphControllerMonitorsSystemCalendarNetworkTrace RouteTrickleIntegratorFaultSuspectsFaultInferenceLocalStoreHistoricalDataLocationInferenceControllerFigure5:Architecture

triggerswiththeController.WhenevertheControllerprocessesamessagematchingatrigger,itinvokestheassociatedcallbackwiththemessageasanargument.TheControlleritselfgeneratesmes-sagestomarkimportantevents,suchasagentstartupandexpirationofaperiodictimer.MonitorsandDomainExpertsarethe“plug-gable”components.Theycanbedevelopedindependentlyofoneanother—onlytheformatoffieldsandvaluesmustbeagreedontoensureproperintra-agentcommunication.

TheagentgeneratesaSTARTmessageonstartup.Thenitgen-eratesaPERIODICTIMERmessageeverypollinginterval,whichtriggersthemonitorstogeneratemessagesencapsulatingtheirmea-surements.Inadditiontothemessagesgeneratedbytheagent,themonitorsalsoregisterforsystem-wideeventssuchasnetworkad-dresschangeandwirelesshand-offevent.

wayservers,andpingtimestothefirsthoprouter.IftheNetworkMonitordetectsthattheuserisconnectedtothenetworkviaawire-lessinterface,itperiodicallycollectsadditionalinformationsuchastheAPtheinterfaceisassociatedwith,otherAPsitcandetectandthesignalstrengthsoftheirbeacons.Themonitoralsogeneratesmessagesthatarespecifictothewirelessinterface.Forexample,ifthewirelessclientishandedoffformoneAPtoanother,itgener-atesaHANDOFFmessage.

TraceRouteMonitor:Thismonitorusestraceroutetodiscoverthenetworkpathbetweentheclientandtheothermachinestowhichitissendingpackets.

ThetotalamountofdatapushedtotheInferenceEngineforeachobservationislessthan1Kbytesandhencepushingdatatoservertakesverynegligibleamountoftheusersnetworkbandwidth.ThisissueisexaminedindetailinSection5.

4.1.2TrickleIntegrator

WedesignedMnMtohandlesituationswhenmobilehostsareunabletoreachtheInferenceEngine.Specifically,MnMincludesamoduleinspiredbyCoda[21]fordealingwithmeasureddataduringweaklyconnectedanddisconnectedoperation.EverytupleofdatacreatedbyaDomainExpertoraMonitorispassedtotheController,andfromthereitisplacedinalocalstore.DatafromthelocalstoreisthenpushedtotheInferenceEnginewhenevertheclienthasconnectivity.TheTrickleIntegratoralsorate-limitsthemessagessentbytheclienttotheserver,and,ifabacklogdevelops,newmessagesaregivenpriorityoveroldones.

4.1.4DomainExperts

4.1.3Monitors

AsmentionedinSection4.1.1monitorscanbedevelopedindepen-dentlyanddynamicallyaddedtotheMnMAgentonanas-neededbasis.Inourcurrentimplementation,theMnMAgentcontainsfourmonitors.

SystemMonitor:Thismonitorreportsvarioussystempropertiesfromthecurrentpollinginterval.Itreportsinformationsuchas,whetherthesystem’sbatteryisbeingchargedandwhetherthesys-temisconnectedtoawirednetwork(e.g.Ethernet).Italsore-portswhetherauseriscurrentlyactiveonthesystem(thesystemisconsideredidleifthereisnouserinputfornminutes,whereniscurrentlysetto2).

CalendarMonitor:TheCalendarMonitortracksthetimeandlocationofacceptedmeetingsfromtheusersenterprisecalendar(e.g.,ExchangeorLotusserver).Thisinformationisusedtoboot-strapthelocationengine,asweshalldescribeinSection4.2.1.

NetworkMonitor:TheNetworkMonitorreportsinformationaboutnetworkconnectivity.Themonitoristriggeredbythenetworkchangerelatedeventsfromthesystem,suchasnetworkaddresschange.Itreportsinformationaboutactivenetworkinterfacesin-cluding:IPandMACaddresses,gateways,DNSanddefaultgate-

InSherlock,theauthorsassumethattheInferenceGraphissta-ble,andhenceitislearnableviablack-boxtechniques.However,mobilitycauseschangestotheInferenceGraph,andeventhoughthechangesmayberegularandsometimespredictable,theyaregenerallytoorapidforblack-boxtechniquestolearnthegraph.Tohandlethis,wedefinetheconceptofaDomainExpert-amodulethatisresponsibleformakingtheappropriatechangestotheInfer-enceGraphwhentriggeredbyahostchangingitsconnectionpointorotherdependencies.

AtypicalDomainExperthascodebothonthehost,aspartoftheAgent,andontheInferenceEngine.DomainExpertsrespondtotriggerssuchaschangeinIPaddress,orAPhandoffevent.Uponsuchchanges,theDomainExpertontheclientnotifiestheDomainExpertontheInferenceEngineofthetriggeringevent.TheDomainExpertontheInferenceEnginethenupdatestheInferenceGraphappropriately.Forexample,whenanAPHandoffeventoccurs,theWiFiDomainExpertontheagentnotifiesitscounterpartontheInferenceEngine.TheInferenceEnginethenupdatestheInferenceGraphtoaccountforthechangeintopology.

WiFiExpert:TheWiFiExpertisresponsibleformanagingthedetailsofhowwirelessconnectivityaffectstheperformanceofap-plicationsrunningonamobilenode.ItdoesthisbyaddingnewrootcauseandobservationnodestotheInferenceGraphinapar-ticularpattern,whichwecallagraphgadget.Basedonreportsfromthemonitorsontheclient,theexpertfillsinthecorrectAPandlocationinformation.Figure6illustratesthenewInferenceGraphgeneratedwiththehelpofaWiFiExpert.

Mostimportantly,foreveryclientwhoselocationcanbedeter-mined,theWiFiExpertaddsanewrootcausenodethatrepresentsthelocation.Thereisonelocationrootcausenodeforeachloca-tionknowntoMnM—alltheclientspredictedtobeinthatlocation

87五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

enginestoresandanalyzesthedatasenttoitfromeachoftheMnMAgents.Usingthisinformationandtheservice-leveldependencygraph,itgeneratesandupdatesanInferenceGraphthatreflectswherethemobileclientsarelocatedandhowtheyareconnectedtothenetwork.ItusestheInferenceGraphtogeneratealistofprobablecauseswheneveritidentifiesperformanceproblems,andsubsequentlyraisesalerts.

4.2.1

Figure6:“Gadget”addedtotheInferenceGraphofmobilehostsbytheWiFiExpert.Newelementsshowningreyorwithdarkerlines.

sharethatnode.Associatedwithlocationistheaprioriprobabilitythatthelocationcausesperformanceproblems.MnMdetermineslocationsandcomputespriorsasdescribedinthesubsectionsthatfollow.Theexpertconnectsthelocationrootcausetoanobserva-tionnodewhosevalueistiedtomeasurementsoftheRTTofpingsbetweentheclientandthecurrentAP.TheRTTprovidesadegreeofdirectestimationofcurrentwirelesschannelquality,whileloca-tionpriorsprovidehistoricalinformationaboutthewirelesschan-nelqualityatthislocation.

HTTPExpert:TheHTTPExpertmonitorstheresponsetimeofwebserverswhenURLsarefetched,andreportsthesetotheIn-ferenceEngine.TheInferenceGraphusestheseasobservationsabouttheapplication’shealth.Fortestingpurposes,ourHTTPEx-pertalsoincludesaURLpollingrobotthatcanbeorderedtofetchparticularURLsduringexperiments.

NetworkExpert:TheNetworkExpertcomputesthenetworkto-pology-relateddynamicpartoftheinferencegraphwheneveranet-workchangeeventoccursontheclient.Itisresponsibleforfillingintwotypesofinformation.First,itcomputesnetworkpathtonetworkservicesbyusingtopologydiscoverytechniques,suchastraceroute.Second,itdetectschangesinlocation-dependentnet-workservices,suchastheDNSandKerberosservers.TheNet-workExpertcounterpartontheInferenceEngineupdatesthisin-formationintheinferencegraph.

ServiceExpert:TheServiceExpertisaspecialexpertthatrunsonlyontheInferenceEngine,andhasnoclientcounterpart.TheServiceExpertisresponsibleforbuildingastatic,service-levelde-pendencygraphforallnetworkedapplications.Aserviceisiden-tifiedbytheservicenameandtheserverthatisprovidingthatser-vice.Forexample,awebsiteisidentifiedbyitsURLandthewebserverhostingit.TheServiceExpertgetsthedataneededtocon-structthedependencygraphfromavarietyofsources.Forexam-ple,systemslike[10,4]usetemporalcorrelationinpackettracestoinferdependencies.Someinformation,suchastopologyofthedatacenter,canbeextractedfromnetworkconfigurationfiles.Thestaticdependencygraphiscombinedwithdynamicinformationfromotherdomainexperts,suchastheNetworkExpertandtheWiFiExpert,tobuildaninferencegraph.

Comment:WenotethattheDomainExpertarchitectureisageneraltechniquethatwillbeusefulforhandlingothertypesofdomainswheretheInferenceGraphchangesfasterthanitcanbelearned.Anexampleofthisispeer-to-peersystemswheretheserversbeinginvokedchangedependingonthequerybeingmade.

LocationInference

4.2TheMnMInferenceEngine

TheMnMInferenceEngineisresponsibleformonitoringthehealthofthemobiledeviceandtheapplicationsrunningonit.The

Thephysicallocationofawirelessclientmayhaveastrongim-pactonitsnetworkperformance[8].Thus,managementtoolsde-signedforwirelessnetworksmustincludeanintegratedlocationestimationsystem.

Anumberoftechniques[6,8,25]havebeenproposedfores-timatingthelocationofclientsinaWi-Finetwork.Thesetech-niquesofferawiderangeoftradeoffbetweenaccuracy,measure-mentoverhead,requiredinfrastructuresupportandtheneedforde-tailedprofilingofthephysicalenvironment.Forthepurposeofnetworkmanagement,itisgenerallysufficienttodeterminetheclientlocationatthegranularityofoneoffice.However,unlikethescenariodescribedpreviously[8],wecannotrelythepresenceofdenselydeployed,fixeddesktoptoserveasmonitors.Hence,wehavebuiltalocationsystemusingthetechniquedescribedin[25].LocationProfiles:Oursystemstoresaprofileforeachlocationofinterest.Toallowforeasyinterpretation,wedefinelocationintermsofofficenumbers,ratherthan(x,y,z)coordinates.TheprofileforeachofficeconsistsofalistofAPs(i.e.theirBSSIDs)thatarevisiblefromthatlocationalongwiththedistributionofobservedsignalstrengthofeachAP.WeassumeaGaussiandistributionandcharacterizeitwithitsmeanandvariance.Theseprofilesaregen-eratedautomatically,aswewillexplainlaterinthissection.

DeterminingClientLocation:Aspartofitsobservations(e.g.,measuringURLresponsetimes),theWi-FiMonitorrunningoneachclientsendstheinferenceenginethelistofAPsseenbytheclient,alongwiththeirsignalstrengths.Usingthestoredprofiles,andtheBayesianinferencetechniquedescribedin[25],thelocationinferencemoduledeterminesthemostlikelylocationoftheclient.andpersistsitwiththeobservationdatainahistorydatabase.Themedianerrorforcomputedlocationisabout5meters(oneortwooffices).WewillpresentadetailedevaluationoftheaccuracyofourlocationsysteminSection6.1.

AutomaticGenerationofProfiles:ToreducetheeffortrequiredtorolloutMnM,weautomaticallygeneratelocationprofilesbyusingtheinformationprovidedbytheCalenderMonitorrunningoneachclient.

Mostcorporateenvironmentsprovideacalendarservicethatem-ployeesusetoschedulemeetingswitheachother.Foreachmeet-ing,thecalendarrecordstheidentitiesofinvitedattendeesandthelocationofthemeeting(e.g.,aconferenceroomoranotherem-ployee’soffice).MnMgeneratesprofilesforroomsthatappearasmeetinglocationsusingtheWi-Fiobservationsreportedbytheem-ployees’laptopsduringthemeetingtime.Toreducetheamountoferroneousinformationincludedinthelocationprofile,MnMveri-fiesboththatthereisactivityontheuser’slaptopduringthemeet-ing(i.e.,theuserhasthelaptopwiththematthemeeting)andthatWi-Fiobservationsareroughlyconsistentwiththoseofotherat-tendees(i.e.,theuserhasactuallygonetothemeeting,ratherthanremainingintheiroffice).

Togenerateaprofileforauser’soffice,MnMlooksforWi-Fiob-servationsmadeduringtimeswhentheuserhasnomeetingsched-uled.ManypeopleplugtheirlaptopsintowiredEthernetand/orwallpowerwhentheyareintheiroffices,andMnMlooksfortheseclueswhenselectingobservationstoconstructtheofficeprofile.

88五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

WealsonotethatinanenvironmentwhereAPsaredeployeddensely,itmaybesufficienttocharacterizethelocationoftheclientsimplybytheAPthattheclientisassociatedwith.Thismethodrequiresnoprofiling,butissubjecttoinaccuracies,sinceclientssometimesassociatewithAPsthatarefaraway.WeevaluatetheusageofAPsasastand-inforlocationinSection6.2.

4.2.2FaultInference

ThefaultinferencemoduleofMnMisresponsiblefortakingthedataproducedbytheagentsinthesystemanddeterminingwhichrootcausesareresponsibleforanyproblems.Theresultinglistoffaultsuspectsisgiventothenetworkmanagersforreportingandresolution.

Themoduleconsistsoftwocomponents:thecomputationoflo-cationpriors,whichisinvokedonceaday,andtheinferencemod-ule,whichisinvokedevery3minutesorwheneverthereisasig-nificantchangeintheobservationsbeingreportedbyclients.

Onceinvoked,theinferencemoduleupdatestheInferenceGraph,computesthestateoftheobservationnodes,andthenrunsthein-ferencealgorithmtodeterminealistoffaultsuspects.

ComputingPriorsforLocations:Insteadofdetailedcurrentmea-surements,MnMreliesonanalysisofpastexperiencetocomputeapriorprobabilityoffailureforeachlocationknowntothesystem.Thesepriorsarethenusedbytheinferencealgorithmwhendeter-miningtherootcausesresponsibleforbadobservations.Priorscanbecheaplycomputedfrominformationalreadyavailableinthehis-toricaldatabasepresentontheInferenceEngine,and,asshowninourevaluation,theylargelyeliminatetheneedfordetailedcurrentmeasurementswhendiagnosingfaults.

Onceaday,theInferenceEnginecomputespriorsforeachlo-cationlbyretrievingfromitshistorydatabaseallresponsetimeobservationsfromlocationswithin6.7metersofl—6.7metersisthemedianerrorofourlocationinferencesystem,soobserva-tionslabeledasbeingfromthoselocationscouldhavecomefroml.MnMthencomputesthefractionofthoseresponsetimesthataredownandusesthisfractionasthepriorprobabilitythatlisfaulty.Thissimplisticapproachimplicitlyassumesthatalldownobser-vationsareduesolelytothelocationalone—discountingtheeffectoftheserversandothercomponentsthatmightaffecttheobser-vations.However,sinceourapproachaveragesovertheresponsetimesofmanyserverscontactedfromlocationloverlongperiodsoftime,anysystematicbiasismostlikelyduetothelocation.MorecomplicatedBayesianestimationtechniquescouldbeused,butourevaluationshowstheyareunnecessaryinourenvironment.

ComputingtheInferenceGraph:TheInferenceEnginecontrollerorchestratestheconstructionoftheInferenceGraphbythevariousDomainExpertsthroughapublish-subscribesystem.Thebasicin-ferencegraphisgeneratedbytheserviceexpert.EachDomainEx-pertsubscribestobenotifiedwhenevernodesoredgeswithspeci-fiedpropertiesareaddedordeletedfromthegraph.Uponreceivingsuchnotification,theDomainExpertmakesitsownalterationstograph.Thisprocessrepeatsuntilnofurtherchangesaremadetothegraph,atwhichpointthegraphisreadytouseforinference.TheprocessofalteringtheInferenceGraphistriggeredwhen-everamonitororexpertonaclientdetectsachange.Forexample,whentheHTTPExpertonclientCobservestheclientaccessingawebpagehttp://foo.comwithresponsetimert,theHTTPEx-pertontheInferenceEnginewillcreateanewobservationnodeforCaccessingfoo.comifitdoesnotalreadyexistintheInferenceGraph.TheadditionofthisobservationnodecausestheServiceDependencyExperttoaddnodesandedgesreflectingtheserversinvolvedinaccessingfoo.com(e.g.,DNS,Kerberos,andfoo.comitself).TheadditionofthesenodescausestheNetworkExpertto

fillinadditionalrootcausesandedgesforthenetworkpathsfromCtothoseservers,theDNSserverscurrentlybeingusedbyC,etc.ComputingObservations:Beforeinvokingtheinferencealgo-rithm,theinferencemodulescansallobservationnodesintheIn-ferenceGraphandinvokestheDomainExpertthatcreatedthenode.TheDomainExpertisexpectedtodeterminewhethertheobserva-tionnodeisupordown,andtypicallydoessobyretrievingrecentmeasurementsforthatnodeanddeterminingiftheyarenormalorabnormal.Forexample,theobservationnodeforaHTTPresponsetimereturnsdowniftheresponsetimeisgreaterthanathresholdbasedonthenormaldistributionofresponsetimesforthatweb-server,andupotherwise.

DiagnosingFaults:GivenanInferenceGraph,priorprobabilitiesforlocations,andtheupanddownstatusoftheobservations,MnMusestheFerretinferencealgorithmdescribedin[4]tocomputetherootcausesthataremostlikelyresponsibleforthedownobserva-tions.Theserootcausesarereturnedasthefaultsuspectlist.

5.IMPLEMENTATION

WehaveimplementedtheMnMsystemshowninFigure5.TheAgentControllerisimplementedasadaemon(service)process.TheDomainExpertsandMonitorsareimplementedasloadablemodulesthatareloadedandinvokedbytheController.TheInfer-enceEngineisimplementedasacentralizedservice.TheInferenceEngineusesadatabasetostorehistoricaldatabutkeepstheInfer-enceGraphandthecurrentobservationsinmemoryforfastaccess.TheInferenceEnginecanruninferencesonliveincomingdataoronthehistoricaldata.OurInferenceEngineintegrateswiththeen-terprisenetworkmanagementsystemdeployedinourorganizationandgeneratesalertsthroughitsconsolewheneveritdiagnosesaperformanceproblem.

Scalabilityisafrequentconcernwithcentralizedsystems.Weevaluatedtwoaspectsofscalabilityofourdesign–theCPUandnetworkoverheadontheclientmachinesandtheperformanceoftheInferenceEngineasthenumberofnodesincreases.

TheCPUoverheadofrunningtheMnMagentonclientmachinesisnegligible.Eachclientmachine,onaverage,generateslessthan1000bytesperminute(0.13Kbps),whichisalsonegligible.

ThetrafficfromallclientsaggregatesatthecentralInferenceEngine.Evenwith10,000activeclients,theInferenceEnginere-ceiveslessthan1.5Mbpsoftraffic.TheCPUoverheadofourIn-ferenceEngineisalsosmall.TheauthorsofSherlock[4]showthattheoverheadofinferencescaleslinearlyasthenumberofnodesincreases.Weobservedsimilarbehaviorwithoursystem.Onamachinewith3GBofRAMandfour3.2GHzCPUs,ourinfer-encealgorithmprocessesanInferenceGraphcontainingmorethan100,000nodesinlessthan5seconds.

6.EVALUATION

WeevaluatedMnMinalargeenterprisenetwork,performingtwotypesofexperiments.Wefirstconductedcontrolledexperi-mentswithintentionallyinjectedfaultstoevaluatetheaccuracyofoursystemwhendiagnosingthefaultsthatmightoccurinanen-terprisenetworkwithallnomadicusers.Oncewehadconfidencethesystemwasperformingcorrectly,wethenranthesystemfortwoweeksonthemachinesof27volunteers,creatingadatasetthatweusetoanalyzethesensitivityofthesystemandthetypesofproblemsfoundinthenetwork.Alltheexperimentspresentedinthissectionwereconductedonaliveproductionenterprisenetworkwiththousandsofcomputers,sothebackgroundtrafficisentirelyrealistic.

五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

10.90.80.70.6CDF0.50.40.30.20.100204060Distance error (m)

Calendar−based profileSurvey−based profile80100Key= 0.1= 0.2= 0.3= 0.5= 0.7Figure7:CDFoferrorinpredictedlocation,measuredinme-ters,over22,000observationsamong96locationsoveraperiodoftwoweeks.

WeinstalledMnMon42computers:27userlaptops,5testlap-tops,and10servers.Thesecomputerswereusednormallybytheirownersintheirdailyactivities.Theusersrepresentavarietyofcorporateusers,includingprogrammers,managersandresearchers.BecausewearenotpartofthecorporateITdepartmentandhadtorecruitvolunteers,wedidnotmonitortheactualwebsitesthatusersvisitedoutofprivacyconcerns.Instead,weaddedanagenttotheirmachinesthatfetchedcontentfromasetoffiveinternalproductionwebsiteseverythreeminutes.

Figure8:Locationpriorsinourbuilding.

sameasthemedianerrorwithsurvey-basedprofiles.Thissuggeststhatcalendar-basedprofilingworkswellforalargenumberoflo-cationsandrecords,althoughmoreobservationslabeledwithcal-endardatawouldbeneededtomatchtheaccuracyofsurvey-basedprofilesacrossalllocations.

6.1LocationInferenceEvaluation

AsdescribedinSection4.2.1,thelocationestimationmoduleinfersalocationforeveryrecordsubmittedtotheInferenceEn-gine,aslongasthesubmittedrecordcontainsawirelessfingerprint.Mostofficesonourfloorareapproximately9squaremeters(3x3)insize.Theconferenceroomsaremuchlarger.Thesizeoftheflooris101metersby86metersandithasapproximately200offices.Duringthetwoweekstudy,thelocationestimationmodulein-ferredlocationsforover77,000records.Ofthese77,000records,22,000weremanuallylabeledbythevolunteerswiththeirtruelo-cation(i.e.theofficeortheconferenceroomthemachinewasac-tuallyinatthattime).

Figure7showstheCDFofthedistanceerrorbetweenthege-ometriccenterofeachrecord’struelocationanditsinferredlo-cation,usingtwodifferentsetsofprofiles.Whenusingprofilesgeneratedautomaticallybyourcalendarheuristics,asdescribedinSection4.2.1,theinferredlocationmatchesthetruelocationex-actly37%ofthetime.Themediandifferenceis6.7meters,whichtranslatestoanerrorofabouttwooffices.Webelievethatthisac-curacyissufficientforourpurposes.

Thecalendar-basedprofileswillcontainsomeerrorsasmachinesarenotalwayslocatedwherethecalendarheuristicsguesstheywillbe.Toestimatethelossinaccuracycausedbythesemistakes,weconductedasurveyofourbuildingbymanuallyplacingalaptopinroughlyeveryotherofficeforafixedperiodoftimeandgatheringthesignalstrengthsofbeaconsbroadcastbythevariousAPs.Wecomputedprofilesfromtheseobservations,andthencomputedthedistanceerroroftherecordswhenlocationswereinferredusingthesesurvey-basedprofiles.

Theerrorislesswhenusingsurvey-basedprofilesasallobser-vationsusedtogeneratetheprofilearelabeledwiththecorrectlo-cation.Thedifferencebetweenthetwocurvesmeasuresthelossofaccuracyduetomistakesmadeguessingthemachine’sloca-tionfromtheusers’calendar.Interestingly,themedianerrorwithourautomaticallygeneratedcalendar-basedprofilesisroughlythe

6.2FieldStudy

Inthissectionwedescribetheresultsofour2-weekstudyofrealusersusingMnM.

LocationPriors:Figure8showsthepriorprobabilitythatfetch-ingaURLwilltakeunacceptablylongfromanoffice,wherethedarkerthecirclethegreatertheprobabilityofthatlocationbeingaproblem.Thereisclearvariationinthepriorsoverthebuilding,indicatingthatlocationdoeshaveastrongeffectontheabilityofnomadicuserstoaccessthecompany’sservers.Themiddle-leftofthebuildingisparticularlybad,themiddle-topofficesareslightlybetter,andtheconferenceroomsinthemiddleandtheofficestotherightare,forthemostpart,thebest.Priorsvaryfrom0.01inthebestareastoalmost0.7intheworst.

FaultDiagnosis:TheInferenceEnginewasrunevery10minutesduringthe2-weekstudy:atotalof1530times.Itdiagnosedafaultduring434oftheseruns.Unsurprisingly,mostfaultswereconcen-tratedduringtheworkinghourswhenmorelaptopsarepresentandnetworkandserverusageishighest.Wehaveconfidenceintheaccuracyofthefaultsdiagnosedbythesystembasedonitsperfor-manceinthecontrolledexperiments.

Figure9showsthenumberoffaultsofeachtypethatwerediag-nosedduringthestudy.Thebarfor“Withlocationpriors”repre-sentstheresultsofMnMasweintendittobeused,withlocationpriorstakenintoaccountbytheinferencealgorithm.Astherecanbemorethanonefaultdiagnosedduringasinglerunoftheinfer-encealgorithm,thenumberoffaultsdiscoveredtotalstomorethan434.Themostcommonsourceofproblemswasthelaptopsthem-selves(“machines”),followedbyaserverinthedatacenter.Ofthe310faultsattributedtoaserverinthedatacenter,114weretoaserverwell-knowntohaveproblemswithintermittentoverloads.MnMalsocorrectlyidentifiedDNSmisconfigurationononeoftheservers.Theserver’sprimaryDNSwasconfiguredto127.0.0.1whileitwasnotrunningaDNSserver.Thiswascausingdelayin

90五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

NetworkElementInternetPathNetworkPathAccessPointHandOffLocationWirelessAccessPointMachineServerWith location priorsLocation = APNo location priors050100150200250300350400450# of occurrences

Figure9:Numberoffaultsdiagnosedduring2-weekstudy,brokenoutbytypeoffaultandlocationinformationused.

DNSlookup,whichultimatelyimpactedtotalURLfetchtimes.Importanceoflocation:Locationwastoblamefor144problems–10%ofthetotal–indicatingthatitisasignificantsourceofer-rors.During3110-minuteintervals,allproblemsseenbyuserswereduesolelytotheusers’location.Basedonthisdata,weex-pectthatMnMwouldbeatleast10%moreaccurateinitsfaultdiagnosesthanasystemthatdoesnotconsiderlocation.

Topredicttheperformanceofasystemthatdoesnotincludelo-cationbutdoesmodelwirelesscomponentslikeaccesspoints,weconfiguredMnMtousetheAPwithwhicheachlaptopwasasso-ciatedasthe“location”ofthatlaptop.Asexpected,thenumberofproblemsattributedtotheaccesspointsincreases.Interestingly,thenumberofproblemsattributedtotheserversgoesdown—withouttheabilitytoblamespecificlocations,thesystemblamestoomanyproblemsonwirelessissues.

Importanceoflocationpriors:ToevaluatetheeffectoflocationpriorsonfaultdiagnosisweranMnMwithlocations,butassign-ingalllocationsthesameprior(labeled“nolocationpriors”inthefigure).ThesystemcorrectlydiagnoseslocationfaultsasoftenasMnMdoeswhenusingaccuratepriors,butitalsoblamesthema-chinesandserversmorethanitshould.Manylocationshaveonlyasinglemachinereportingobservations,astheyareprivateoffices,andwithoutthehistoricalperspectiveprovidedbythepriorthesys-temdoesnothaveenoughindependentobservationstoconfidentlydistinguishbetweenaproblemwiththelocation,theuser’slaptop,ortheremoteserver.

6.3ControlledExperiments

Toevaluatetheaccuracyofoursystemindiagnosingproblemsthatariseinclientmobilityscenarios,weconductedcontrolledex-perimentswherewedeliberatelyimpairedpartsofthenetworktocreatefaults.Theseexperimentswereconductedonourproductioncorporatenetwork,sotherewasnormalcorporatebackgroundtraf-ficandsomenaturallyoccurringfailuresduringtheexperiments.However,theresultsheregivealowerboundontheaccuracyofMnM.

Methodology:Forthefollowingexperiments,all42machinespolledfourenterprisewebsitesonceevery60seconds.TheMnMAgentsrantheapplicationexpertsandmonitorsdescribedinSec-tion4.1.4.

Eachexperimentranforatleast60minutes,withthespecifiedfaultinjectedatthebeginningoftheexperiment.TheInferenceEngineranonceeveryminute,producingatleast60setoffaultsuspectsforeachexperiment.Fortheseexperiments,werequiredthattheInferenceEnginereturntherootcauserepresentingthein-jectedfaultwithrankoneortwobeforecountingitasasuccessfuldiagnosis.Thisisbecausenetworkmanagersareunwillingtolookbeyondthetopfewrootcauses.Table1presentsasummaryoftheresults.

ProblemsDuetoBadLocation:TomeasuretheaccuracyofourInferenceEngineinidentifyingbadlocations,wecreatedthefollowingexperimentalsetup.WeplacetwolaptopsinalocationwithpoorperformancecharacteristicsduetoitslongdistancefromanAP,andforcethelaptopstoassociatewiththatAP.Threeotherlaptops,placedclosertotheAP,werealsoassociatedwiththeAP.TheexperimenttestswhetherMnMcancorrectlydeterminethatmultipleperformancefaultsobservedforclientsassociatedwiththesameAPdonotnecessarilyimplythattheAPisatfault.Instead,MnMmustdeterminetheimpactofaclient’slocationonitsperfor-mance.ThefirstrowofTable1presentsasummaryoftheresults.Wemadetwoobservationsduringthisexperiment:

First,whenthelocationmoduleaccuratelyinfersthelocationsofthetwolaptopsseeingpoorperformance,theInferenceEnginecorrectlyidentifiedthelocationasthehighestrankedrootcause.Second,whenthelocationmoduledoesnotreportthetwopoorly-performinglaptopsbeingatthesamelocation,theInferenceEnginereportsthelocationasthesecond-highestrankedrootcause.Thewirelessaccesspointwasreportedasthehighestrankedrootcause,asitwasashareddependencybetweenthetwolaptopsintheInfer-enceGraph,whereaseachlaptopwas(incorrectly)connectedtoadifferentlocationrootcause.

ProblemsDueToBadAccessPoint:TodeterminetheaccuracyofMnMinidentifyingapoorlyperformingAP(e.g.onesufferingfrominterferencenearit),wecreatedthefollowingexperimentalsetup.WeconnectfourlaptopsfromdifferentlocationstoaspecificAP.WereducedthecapacityoftheAPbyintroducinga500msdelayonallpacketstraversingthroughit.TheexperimenttestswhetherMnMcancorrectlydeterminethatmultipleperformancefaultsobservedforclientsassociatedwiththesameAPdo,insomecases,implythattheAPisatfault.AsshowninthesecondrowofTable1,MnMcorrectlyidentifiedtheAPastherootcauseforallofourobservations.

ProblemsDuetoHandoff:Wirelesslaptopssometimesexperi-encebadperformancebecausetheirdevicedriveristooaggressiveatchangingAPsinanattempttoachievebetterperformance.

WesetupthefollowingexperimenttoevaluateMnM’sabilitytocorrectlydetectproblemsduetoAPhandoffs.WeforcedonelaptoptoswitchbetweentwoAPsevery30seconds,causingtheperformanceoftheclienttosuffer.OtherclientsassociatedwiththetwoAPsfromdifferentlocations,andtheycontinuedtoperformnormally.AsshowninthethirdrowofTable1,MnMidentifiedthehandoffasthecorrectrootcausefor86%oftheobservations.

Fortheremaining14%oftheobservations,theAPwasidentifiedasthetopmostrootcauseandthehandoffwasrankedsecond.Thisisactuallythecorrectresult,asfurtherinvestigationshowedoneofthetwoAPsbeganexperiencingoutsideinterferenceduringtheexperiment,andhenceallclientsassociatedwiththatAPsawpoorperformance.ThisexperimenthighlightshowtheInferenceEngineisabletoquicklyidentifytherightrootcauseevenunderrapidlychangingconditions.

SimultaneousDiagnosis:TomeasurehowwellMnMdealswithmultiplesimultaneousfailures,weperformedtwoexperimentswhereweinjectedmultiplefaultsatthesametime.

91五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

TargetRootCauseLocationAPAPHandoffServerSimultaneousFaults

%thetargetRootCause

isfirst5510086100100OtherRootCauses

intoptwo

Machine,Server,APFirst-hoprouterLocation,Machine,APLast-hoprouterAPFirst-hoprouterReasonsforotherrootcausesLocationerrorRealcongestionattheserverFewpositiveobservationsthroughthefirst-hoprouterLocationerror,APfailuresFewpositiveobservationsforthelast-hoprouterFewpositiveobservationsforthefirst-hoprouter

Table1:Rootcauseanalysis

Forthefirstexperiment,wedeliberatelydelayedthepacketsen-teringandleavingtheserverby500ms,andwesimultaneouslyplacedtwoclientsatalocationwithknownpoorperformance.Theexpectedoutcomeforthisexperimentisfortheservertobethehighest-rankedrootcauseandthelocationtobethesecondhigh-est.MnMcorrectlyrankedthesetworootcausesforalltheobser-vations.

Inthesecondexperiment,weplacedtwoclientsinabadloca-tion,andweagaindelayedpacketstraversingtheAPsothatperfor-manceofallclientsassociatedwithitsuffered(notjustthetwoatthebadlocation).TheinferencealgorithmperformedasexpectedandcorrectlyrankedtheAPasthehighest-rankedrootcauseandthebadlocationasthesecond-highest-rankedrootcauseforallob-servations.

[8]R.Chandra,J.Padhye,A.Wolman,andB.Zill.A

Location-basedManagementSystemforEnterpriseWirelessLANs.InNSDI,2007.

[9]Y.-C.Cheng,M.Afanasyev,P.Verkaik,P.Benko,J.Chiang,

A.Snoeren,G.Voelker,andS.Savage.Automated

cross-layerdiagnosisofenterprisewirelessnetworks.InSIGCOMM,2007.

[10]Y.-C.Cheng,J.Bellardo,P.Benko,A.Snoeren,G.Voelker,

andS.Savage.Jigsaw:Solvingthepuzzleofenterprise802.11analysis.InSIGCOMM,2006.

[11]PrivateconversationwithDelllabmembers.[12]F.Giroire1,J.Chandrashekar,G.Iannaccone,

K.Papagiannaki,E.M.Schooler,,andN.Taft.Thecubiclevs.thecoffeeshop:Behavioralmodesinenterpriseend-users.InProc.ofPAM,2008.

[13]S.Gittlen.“Wanttomanageyourwired/wirelessLANs

together?Toobad”.ComputerWorld,March2007.

[14]S.Kandula,D.Katabi,andJ.-P.Vasseur.Shrink:AToolfor

FailureDiagnosisinIPNetworks.InProc.MineNetWorkshopatSIGCOMM,2005.

[15]R.R.Kompella,J.Yates,A.Greenberg,andA.Snoeren.IP

FaultLocalizationViaRiskModeling.InProc.ofNSDI,May2005.

[16]D.KotzandK.Essien.Analysisofacampus-widewireless

network.InMOBICOM,2002.

[17]M.Lopez.ForresterResearch:TheStateofNorthAmerican

EnterpriseMobilityin2006.December2006.

[18]R.Mahajan,M.Rodrig,D.Wetherall,andJ.Zahorjan.

AnalyzingMAC-levelbehaviorofwirelessnetworksinthewild.InSIGCOMM,2006.

[19]HPOpenview.http://www.openview.hp.com/.

[20]P.Reynolds,J.L.Wiener,J.C.Mogul,M.K.Aguilera,and

A.Vahdat.WAP5:Black-boxPerformanceDebuggingforWide-areaSystems.InWWW,May2006.

[21]M.Satyanarayanan.Mobileinformationaccess.IEEE

PersonalCommunications,Feb.1996.[22]EMCSmartsFamily.

http://www.emc.com/products/software/smarts/smartsfamily/.[23]IBMTivoli.http://www.ibm.com/software/tivoli/.

[24]S.Yemini,S.Kliger,E.Mozes,Y.Yemini,andD.Ohsie.

HighSpeedandRobustEventCorrelation.InIEEECommunicationsMagazine,1996.

[25]M.A.Youssef,A.Agrawala,andA.U.Shankar.WLAN

locationdeterminationviaclusteringandprobabilitydistributions.InIEEEPercom,2003.

7.CONCLUSION

Thispaperhighlightstheissuesthatanenterprisenetworkman-agementanddiagnosissystemmusthandlewhenallitsusersarenomadic.Theseissuesincluderapidlychangingdependencies,rootcauseanalysisinunifiedwiredandwirelessnetworksandtheim-pactofphysicallocationonapplicationperformance.WepresentMnM,anend-hostbased,integratednetworkmonitoringandfaultdiagnosissystem,andweshowthattakinganintegratedapproachtowiredandwirelessmonitoringimprovestheaccuracyoffaultdiagnosis.

8.REFERENCES

[1]A.Adya,P.Bahl,R.Chandra,andL.Qiu.Architectureand

TechniquesforDiagnosingFaultsinIEEE802.11InfrastructureNetworks.InMOBICOM,2004.

[2]AirDefense:WirelessLANSecurity.http://airdefense.net.[3]AirTightNetwoks.http://airtightnetworks.net.

[4]P.Bahl,R.Chandra,A.Greenberg,S.Kandula,D.A.Maltz,

andM.Zhang.Towardshighlyreliableenterprisenetworkservicesviainferenceofmulti-leveldependencies.InSIGCOMM,2007.

[5]P.Bahl,R.Chandra,D.A.Maltz,P.Patel,J.Padhye,and

L.Ravindranath.TowardsUnifiedmanagementofNetworkedServicesinWiredandWirelessEnterpriseNetworks.Technicalreport,2008.MSR-TR-2008-18.[6]P.BahlandV.N.Padmanabhan.RADAR:Anin-building

RF-baseduserlocationandtrackingsystem.InINFOCOM,2000.

[7]M.BalazinskaandP.Castro.Characterizingmobilityand

networkusageinacorporatewirelesslocal-areanetwork.InMOBISYS,2003.

92五道口生活网 www.wdklife.com 五道论坛(五道口人自己的论坛) www.wdklife.com/bbs 通洲生活网 www.85118.com

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- kqyc.cn 版权所有 赣ICP备2024042808号-2

违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务