來(lái)源:數(shù)據(jù)觀 時(shí)間:2018-12-05 13:00:21 作者:Keith D. Foote
Big Data Trends in 2019
數(shù)據(jù)觀|黃玉葉(譯)
【編者按】2019年,新的大數(shù)據(jù)概念及技術(shù)將陸續(xù)浮出市面,老舊技術(shù)會(huì)逐步消失,或者出現(xiàn)舊術(shù)新用的情況。物聯(lián)網(wǎng)的持續(xù)壯大為大數(shù)據(jù)提供了鮮活資源,新技術(shù)不僅可以改變商業(yè)情報(bào)的收集方式,同樣也會(huì)改變商業(yè)運(yùn)作的模式……
?
The accessibility of data has provided a new generation of technology and has shifted the business focus towards data-driven decision making. Big Data Analytics is now an established part of gathering Business Intelligence. Many businesses, particularly those online, consider Big Data a mainstream practice. These businesses are constantly researching new tools and models to improve their Big Data utilization.
數(shù)據(jù)的可訪問(wèn)性衍生出新一代技術(shù),并將商務(wù)重頭轉(zhuǎn)向數(shù)據(jù)驅(qū)動(dòng)的決策制定?,F(xiàn)下,大數(shù)據(jù)分析已成為收集商業(yè)情報(bào)的組成部分。許多企業(yè),尤其是線上企業(yè),都認(rèn)為大數(shù)據(jù)是主流標(biāo)配。這些企業(yè)馬不停蹄地研究新工具、新模型,以提高他們的大數(shù)據(jù)利用率。
In 2019, some tools and trends will be more popular than others. New Big Data concepts and technologies are constantly appearing on the market, and older technologies fade away, or get used in new ways. The continuous growth of the Internet of Things (IoT) has provided several new resources for Big Data. New technologies change not only how Business Intelligence is gathered, but how business is done.
2019年,一些工具和趨勢(shì)將脫穎而出,更受青睞。新的大數(shù)據(jù)概念及技術(shù)將陸續(xù)浮出市面,老舊技術(shù)會(huì)逐步消失,或者出現(xiàn)舊術(shù)新用的情況。物聯(lián)網(wǎng)的持續(xù)壯大為大數(shù)據(jù)提供了新的資源,新技術(shù)不僅改變了商業(yè)情報(bào)的收集方式,同樣也改變了商業(yè)運(yùn)作模式。
Streaming the IoT for Machine Learning
將物聯(lián)網(wǎng)(IoT)串聯(lián)至機(jī)器學(xué)習(xí)
There are currently efforts to use the Internet of Things (IoT) to combine Streaming Analytics and Machine Learning. In 2019, we can anticipate significant research on this theme, and possibly a startup or two marketing their services or software.
當(dāng)前,相關(guān)研究正努力讓物聯(lián)網(wǎng)和流分析、機(jī)器學(xué)習(xí)結(jié)合起來(lái)。2019年,我們可以對(duì)這一主題的重大研究翹首以盼,一兩家初創(chuàng)企業(yè)有望從事相關(guān)服務(wù)或軟件營(yíng)銷(xiāo)。
Typically, Machine Learning uses “stored” data for training, in a “controlled” learning environment. In this new model, streaming data provides useful information from the Internet of Things to offer Machine Learning in real time, in a less controlled environment. A primary goal in this process is to provide more flexible, more appropriate responses to a variety of situations, with a special focus on communicating with humans.
通常,機(jī)器學(xué)習(xí)使用“存儲(chǔ)”數(shù)據(jù)在“受控”的學(xué)習(xí)環(huán)境中進(jìn)行訓(xùn)練。在新的模型中,物聯(lián)網(wǎng)中的流數(shù)據(jù)提供有用信息,在一個(gè)不那么“受控”的環(huán)境中實(shí)時(shí)支持機(jī)器學(xué)習(xí)。這個(gè)過(guò)程的主要目的是重點(diǎn)關(guān)注人機(jī)交流,讓機(jī)器面對(duì)各種情況可以作出更靈活更適當(dāng)?shù)姆磻?yīng)。
Changing from a training model that uses a controlled environment and limited training data to a much more open training system requires more complex algorithms. Machine Learning then trains the system to predict outcomes with reasonable accuracy. As the primary model adjusts and evolves, models at the edge or in the Cloud will coordinate to match the changes, as needed. Ted Dunning, the Chief Application Architect at MapR said:
從一種使用受控環(huán)境加有限訓(xùn)練數(shù)據(jù)的訓(xùn)練模型到一個(gè)更加開(kāi)放的訓(xùn)練系統(tǒng),需要更復(fù)雜的算法。機(jī)器學(xué)習(xí)繼而訓(xùn)練系統(tǒng)以合理的精度預(yù)測(cè)結(jié)果,隨著初級(jí)模型的調(diào)整和演進(jìn),邊緣計(jì)算或云計(jì)算中的模型將根據(jù)需要進(jìn)行協(xié)調(diào)以匹配這些變化。MapR(知名大數(shù)據(jù)企業(yè))的首席應(yīng)用程序設(shè)計(jì)師Ted Dunning說(shuō):
“We will see more and more businesses treat computation in terms of data flows rather than data that is just processed and landed in a database. These data flows capture key business events and mirror business structure. A unified data fabric will be the foundation for building these large-scale flow-based systems.”
“我們將看到越來(lái)越多的企業(yè)以數(shù)據(jù)流的方式來(lái)處理計(jì)算,而不是僅僅處理數(shù)據(jù)并將其存入數(shù)據(jù)庫(kù)。這些數(shù)據(jù)流捕獲關(guān)鍵業(yè)務(wù)事件并反映業(yè)務(wù)結(jié)構(gòu),要構(gòu)建這些大型的,基于流的系統(tǒng),統(tǒng)一的數(shù)據(jù)結(jié)構(gòu)是基礎(chǔ)。”
AI Platforms
人工智能平臺(tái)
Big Data as a tool of discovery continues to evolve and mature, with some enterprises accessing significant rewards. A recent advancement is the use of AI (Artificial Intelligence) platforms. AI platforms will have significant impact over the next decade. Using AI platforms to process Big Data is a significant improvement in gathering Business Intelligence and improving efficiency. Anil Kaul, CEO and Co-Founder of Absolutdata stated:
大數(shù)據(jù)作為一種探索工具不斷發(fā)展趨向成熟,一些企業(yè)因此獲得了可觀回報(bào)。最近的一項(xiàng)進(jìn)展是人工智能平臺(tái)的使用,人工智能平臺(tái)將在未來(lái)十年產(chǎn)生重大影響。利用人工智能平臺(tái)處理大數(shù)據(jù),是收集商業(yè)情報(bào),提高效率的一個(gè)重要改進(jìn)。Anil Kaul,Absolutdata(知名大數(shù)據(jù)企業(yè))的首席執(zhí)行官和聯(lián)合創(chuàng)始人說(shuō):
“We started an email campaign, which I think everybody uses Analytics for, but because we used AI, we created a 51 percent increase in sales. While Analytics can figure out who you should target, AI recommends and generates what campaigns should be run.”
“我們發(fā)起了一個(gè)電子郵件活動(dòng),我認(rèn)為每個(gè)人都要用到大數(shù)據(jù)分析,但是通過(guò)使用人工智能,我們創(chuàng)造了51%的銷(xiāo)售增長(zhǎng)額。當(dāng)大數(shù)據(jù)分析找出你的既定目標(biāo)對(duì)象時(shí),人工智能會(huì)建議并生成應(yīng)該發(fā)起的活動(dòng)。”
AI platforms will gain in popularity in 2019. AI platforms are frameworks designed to work more efficiently and effectively than more traditional frameworks. When an AI platform is designed well, it will provide faster, more efficient communications with Data Scientists and other staff. This can help reduce costs in several ways—such as by preventing the duplication of efforts, automating basic tasks, and eliminating simple, but time-consuming activities (copying, data processing, and constructing ideal customer profiles).
人工智能平臺(tái)將在2019年普及。人工智能平臺(tái)比傳統(tǒng)框架更有效,平臺(tái)的設(shè)計(jì),能夠建立與數(shù)據(jù)科學(xué)家和其他工作人員之間快速、高效的交流方式,多方降低成本,比如防止重復(fù)工作、自動(dòng)完成基礎(chǔ)任務(wù)、消除簡(jiǎn)單又耗時(shí)的內(nèi)容(復(fù)制、數(shù)據(jù)處理和構(gòu)建理想客戶檔案)。
AIs will also provide Data Governance, making best practices available to Data Scientists and staff. The AI becomes a trusted advisor, and can also help to ensure work is spread more evenly, and completed more quickly. Artificial Intelligence platforms are arranged into five layers of logic:
人工智能系列還將提供數(shù)據(jù)治理,為數(shù)據(jù)科學(xué)家和工作人員帶來(lái)最佳實(shí)踐。人工智能會(huì)成為一個(gè)值得信賴的顧問(wèn),幫助確保均勻分工并快速完成工作。人工智能平臺(tái)可以分為五層邏輯:
·The Data & Integration Layer gives access to the data. (Critical, as developers do not hand-code the rules. Instead, the rules are being “l(fā)earned” by the AI.)
·The Experimentation Layer lets Data Scientists develop, test, and prove their hypothesis.
·The Operations & Deployment Layer supports model governance and deployment. This layer offers tools to manage the deployment of various “containerized” models and components.
·The Intelligence Layer organizes and delivers intelligent services and supports the AI.
·The Experience Layer is designed to interact with users through the use of technologies such as augmented reality, conversational UI, and gesture control.
①數(shù)據(jù)和集成層:提供對(duì)數(shù)據(jù)的訪問(wèn)。(關(guān)鍵是,開(kāi)發(fā)人員不會(huì)手工編寫(xiě)規(guī)則;相反,人工智能正在“學(xué)習(xí)”這些規(guī)則)
②實(shí)驗(yàn)層:允許數(shù)據(jù)科學(xué)家開(kāi)發(fā)、測(cè)試和驗(yàn)證他們的假設(shè)。
③操作和部署層:支持模型管理和部署。這一層提供了管理各種“集裝箱化”模型和組件部署的工具。
④智能層:組織和交付智能服務(wù),支持人工智能。
⑤體驗(yàn)層:旨在通過(guò)使用增強(qiáng)現(xiàn)實(shí)、對(duì)話界面和手勢(shì)控制等技術(shù)與用戶交互。
The Data Curator
數(shù)據(jù)管理員
In 2019, many organizations will find the position of Data Curator (DC) has become a new necessity. The Data Curator’s role will combine responsibility for managing the organizations metadata, as well as Data Protection, Data Governance, and Data Quality. Data Curators not only manage and maintain data, but may also be involved in determining best practices for working with that data. Data Curators are often responsible for presentations, with the data shown visually in the form of a dashboard, chart, or slideshows.
2019年,大眾會(huì)發(fā)現(xiàn)數(shù)據(jù)管理員(DC)的職位將成為一種新的需要。數(shù)據(jù)管理員的角色將把管理元數(shù)據(jù)的責(zé)任和數(shù)據(jù)保護(hù)、數(shù)據(jù)治理和數(shù)據(jù)質(zhì)量結(jié)合起來(lái)。數(shù)據(jù)管理員不僅管理和維護(hù)數(shù)據(jù),而且還可能參與確定與該數(shù)據(jù)的最佳工作實(shí)踐。數(shù)據(jù)管理員通常負(fù)責(zé)演示,數(shù)據(jù)顯示在儀表板、圖表或幻燈片的形式中。
The Data Curator regularly interacts with researchers, and also schedules educational workshops. The DC communicates with other curators to collaborate and coordinate, when appropriate. (Good communication skills are a plus). Tomer Shiran, co-founder and CEO of Dremio, said:
數(shù)據(jù)管理員定期與研究人員進(jìn)行互動(dòng),并安排教育研討會(huì)。在適當(dāng)?shù)那闆r下,數(shù)據(jù)管理員與其他策展人交流合作和協(xié)調(diào)。Dremio(知名大數(shù)據(jù)企業(yè))的聯(lián)合創(chuàng)始人兼首席執(zhí)行官Tomer Shiran說(shuō):
“The Data Curator is responsible for understanding the types of analysis that need to be performed by different groups across the organization, what datasets are well suited for this work, and the steps involved in taking the data from its raw state to the shape and form needed for the job a data consumer will perform. The data curator uses systems such as self-service data platforms to accelerate the end-to-end process of providing data consumers access to essential datasets without making endless copies of data.”
“數(shù)據(jù)管理員負(fù)責(zé)理解跨組織中不同組執(zhí)行的分析類(lèi)型,什么數(shù)據(jù)集適配什么工作,以及數(shù)據(jù)消費(fèi)者將數(shù)據(jù)從原始狀態(tài)轉(zhuǎn)換為執(zhí)行形態(tài)時(shí)所涉及的步驟。數(shù)據(jù)管理員使用自助數(shù)據(jù)平臺(tái)等系統(tǒng)加速端到端的流程,為數(shù)據(jù)消費(fèi)者提供對(duì)基礎(chǔ)數(shù)據(jù)集的訪問(wèn),而非無(wú)休止地復(fù)制數(shù)據(jù)?!?/p>
Politics and GDPR
政治與《通用數(shù)據(jù)保護(hù)條例》(GDPR)
The European Union’s General Data Protection Regulation (GDPR) went into effect on May 25, 2018. While GDPR is focused in Europe, some organizations, in an effort to simplify their business and promote good customer relations, have stated they will provide the same privacy protections for all their customers, regardless of where they live. This approach, however, is not the general position taken by businesses and organizations outside of Europe. Many corporations have chosen to revamp their consent procedures and data handling processes, and to hire new staff, all in an effort to maximize the private data they “can” gather.
歐洲聯(lián)盟的通用數(shù)據(jù)保護(hù)條例(GDPR)已于2018年5月25日生效。雖然GDPR針對(duì)歐洲國(guó)家,但一些企業(yè)為了簡(jiǎn)化業(yè)務(wù),促進(jìn)良好客戶關(guān)系,也聲明他們將為所有客戶提供同樣的隱私保護(hù),不管他們來(lái)自哪個(gè)國(guó)家。然而,這種方法并不是歐洲以外的企業(yè)和組織所采取的基本立場(chǎng),許多公司選擇修改他們的同意程序和數(shù)據(jù)處理流程,并雇傭新員工,這一切做法都是為了使他們“可以”最大化收集私人數(shù)據(jù)。
Businesses relying on “assumed consent” for all processing operations can no longer make this assumption when doing business with Europeans. Businesses have had to implement new procedures for notices and receiving consent, and many are currently trying to plan for what’s next, while simultaneously struggling with problems in the present.
所有業(yè)務(wù)運(yùn)作都依賴于“假定同意”的企業(yè),在與歐洲人做生意時(shí),不能再做出假定同意了。企業(yè)不得不實(shí)施通知和征求同意的新程序,許多企業(yè)目前正在努力為下一步做計(jì)劃,同時(shí)也在努力解決當(dāng)前問(wèn)題。
Several organizations have assigned GDPR responsibilities to their Chief Security Officers. (The CDC should be responsible for having these changes made.) Though GDPR fines can be quite large (fines can be as high as 20 million Euros or four percent of the annual global turnover, depending on which is higher), many businesses, especially in the United States, are still not prepared.
一些組織已經(jīng)將GDPR的責(zé)任交給了他們的首席安全官(首席安全官應(yīng)對(duì)這些變化負(fù)責(zé))。雖然GDPR的罰款金額可能相當(dāng)大(罰款金額可能高達(dá)2000萬(wàn)歐元或4%的年度全球營(yíng)業(yè)額,這取決于兩者哪個(gè)更高),但許多企業(yè),尤其在美國(guó),仍然沒(méi)有準(zhǔn)備好。
In 2019 the U.S. government could make an effort to imitate the GDPR and hold businesses accountable for how they handle privacy and personal data. In the short term, it would make sense for online businesses to begin implementing new privacy policies or simply make the shift to a GDPR policy format. Making the shift now, and advertising it on the company’s website, has the potential to develop a good relationship with the customer base.
2019年,美國(guó)政府可能會(huì)努力模仿GDPR,讓企業(yè)對(duì)他們?nèi)绾翁幚黼[私和個(gè)人數(shù)據(jù)負(fù)責(zé)。從短期來(lái)看,在線企業(yè)開(kāi)始實(shí)施新的隱私政策,或者干脆改用GDPR政策模式,都是有意義的?,F(xiàn)在,在公司網(wǎng)站上做廣告,有可能與客戶建立良好的關(guān)系。
5G Not Likely in 2019
2019年5G不太可能實(shí)現(xiàn)
Switching to a 5G (fifth generation) system is expensive and comes with some potential issues. While the expense may not stop 5G implementation in 2019, other problems might.
切換到5G(第五代)系統(tǒng)相當(dāng)昂貴,并且存在一些潛在的問(wèn)題。雖然高昂的費(fèi)用可能不會(huì)阻擋2019年實(shí)施5G的步伐,但其他問(wèn)題也許會(huì)。
Though the U.S. Federal Government completely supports the implementation of a 5G system, some communities have passed ordinances halting the installation of a 5G infrastructure. It seems likely this will become a standard practice for blocking 5G systems.
雖然美國(guó)聯(lián)邦政府完全支持實(shí)施5G系統(tǒng),但一些社區(qū)已經(jīng)通過(guò)了阻止5G基礎(chǔ)設(shè)施安裝的條例,這似乎將成為阻止5G系統(tǒng)的標(biāo)準(zhǔn)做法。
An additional factor blocking 5G is a decision by the United States FCC, which eliminated regulations supporting net neutrality. Net neutrality offered internet providers, and their users, a level playing field, and promoted competition. Net neutrality is the concept that internet providers should treat all data, and people, equally, without discrimination and without charging different users different rates based on such things as speed, content, websites, platforms, or applications.
阻礙5G的另一個(gè)因素是美國(guó)聯(lián)邦通信委員會(huì)(FCC)的一項(xiàng)決定,該決定取消了支持網(wǎng)絡(luò)中立性的法規(guī)。網(wǎng)絡(luò)中立為互聯(lián)網(wǎng)提供商及其用戶提供了一個(gè)公平的競(jìng)爭(zhēng)環(huán)境,促進(jìn)公平競(jìng)爭(zhēng)。網(wǎng)絡(luò)中立性是指互聯(lián)網(wǎng)供應(yīng)商應(yīng)該平等對(duì)待所有數(shù)據(jù)和人,不歧視,不根據(jù)速度、內(nèi)容、網(wǎng)站、平臺(tái)或應(yīng)用程序向不同的用戶收取不同的費(fèi)用。
Hybrid Clouds Will Gain in Popularity
混合云將或?qū)⑵占?/strong>
Clouds and Hybrid Clouds have been steadily gaining in popularity and will continue to do so. While an organization may want to keep some data secure in its own data storage, the tools and benefits of a hybrid system make it worth the expense. Hybrid Clouds combine an organization’s private Cloud with the rental of a public Cloud, offering the advantages of both. Expect a significant increase in the use of Hybrid Clouds in 2019.
云和混合云一直在穩(wěn)步增長(zhǎng),并將繼續(xù)這樣做。雖然企業(yè)可能希望在自己的數(shù)據(jù)存儲(chǔ)中保持某些數(shù)據(jù)的安全性,但是混合系統(tǒng)的工具和優(yōu)點(diǎn)使其值得付出代價(jià)?;旌显茖⑵髽I(yè)的私有云與租用公共云結(jié)合在一起,提供了兩者的優(yōu)點(diǎn),預(yù)計(jì)混合云的使用將在2019年顯著增加。
Generally speaking, the applications and data in a Hybrid Cloud can be transferred back and forth between on-premises (private) Clouds and IaaS (public) Clouds, providing more flexibility, deployment options, and tools. A public Cloud, for example, can be used for the high-volume, low-security projects, such as email advertisements, and the on-premises Cloud can be used for more sensitive projects, such as financial reports.
一般來(lái)說(shuō),混合云中的應(yīng)用程序和數(shù)據(jù)可以在本地云(私有)和IaaS云(公共)之間來(lái)回傳輸,從而提供更多的靈活性、部署選項(xiàng)和工具。例如,公共云可以用于高容量、低安全性的項(xiàng)目,如電子郵件廣告,而本地云可以用于更敏感的項(xiàng)目,如財(cái)務(wù)報(bào)告。
The term “Cloud Bursting” is a feature of Hybrid Cloud systems and describes an application that is running within the on-premises Cloud, until there is a spike in the demand (think Christmas shopping online, or filing taxes), and then the application will “burst” through, into the public Cloud, and tap into additional resources.
“云爆發(fā)”這一術(shù)語(yǔ)是混合云系統(tǒng)的功能,描述了一個(gè)運(yùn)行在本地云上的應(yīng)用程序,當(dāng)該應(yīng)用程序遇到一個(gè)激增的需求(例如圣誕節(jié)網(wǎng)上購(gòu)物,或申請(qǐng)稅等情況),通過(guò)“爆發(fā)”至公共云,攫取和利用額外的資源。
注:《外媒預(yù)測(cè)2019年大數(shù)據(jù)趨勢(shì)》來(lái)源于Dataversity(點(diǎn)擊查看原文)。數(shù)據(jù)觀編譯/黃玉葉,轉(zhuǎn)載請(qǐng)注明譯者和來(lái)源。
責(zé)任編輯:李蘭松