Видео 2 956
Просмотров 16 105 448

Say goodbye to messy JSON headaches with VARIANT

1:57

Data Intelligence Day Seoul 2024

2:31

An Introduction to DBRX

17:50

Demo: How Do I Use DBRX?

11:08

What's Next for Apache Spark™ Including the Upcoming Release of Apache Spark 4.0

18:30

The Evolution of Delta Lake from Data + AI Summit 2024

16:07

LakeFlow Demo

Databricks LakeFlow is a new solution that contains everything you need to build and operate production data pipelines. It includes new native, highly scalable connectors for databases including MySQL, Postgres, SQL Server and Oracle and enterprise applications like Salesforce, Microsoft Dynamics, NetSuite, Workday, ServiceNow and Google Analytics. Users can transform data in batch and streaming using standard SQL and Python.
Learn about Data Engineering: www.databricks.com/solutions/data-engineering

Видео

Say goodbye to messy JSON headaches with VARIANT

1:57

Say goodbye to messy JSON headaches with VARIANT

Просмотров 2,2 тыс.22 часа назад

Try it out today on Databricks: docs.databricks.com/en/semi-structured/variant.html Read more about it on our blog: www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark If you're curious about the implementation check out the talk: ruclips.net/video/jtjOfggD4YY/видео.html Or read about it on GitHub: github.com/apache/spark/blob/master/common/variant/README.md

2:31

Data Intelligence Day Seoul 2024

Просмотров 514День назад

Data Intelligence Day Seoul, Korea took place on 23 April 2024 and gathered over 1,200 industry leaders and data and AI experts. Watch Data Intelligence Day Seoul On Demand: events.databricks.com/KoreaDIDays2024

17:50

An Introduction to DBRX

Просмотров 3,7 тыс.День назад

Learn from Naveen Rao, VP of Generative AI at Databricks, as he explains DBRX, a new, open source foundation model that sets the standard for production quality and price/performance. With up to 3x faster inference, DBRX - outperforms all other open models in quality benchmarks - and that allows enterprises to quickly build your own custom LLM efficiently and with full control. Read more about ...

11:08

Demo: How Do I Use DBRX?

Просмотров 1,5 тыс.День назад

Watch how DBRX uses Databricks to build and customize GenAI applications using your own enterprise data Read more about DBRX here: www.databricks.com/blog/announcing-dbrx-new-standard-efficient-open-source-customizable-llms?

What's Next for Apache Spark™ Including the Upcoming Release of Apache Spark 4.0

18:30

What's Next for Apache Spark™ Including the Upcoming Release of Apache Spark 4.0

Просмотров 6 тыс.День назад

Reynold Xin, Co-founder and Chief Architect, Databricks shares the latest innovation coming out of the Apache Spark™ open source project including a preview of the anticipated release of Spark 4.0 Speakers: Reynold Xin, Co-founder and Chief Architect, Databricks Tareef Kawaf, President, Posit Sofware, PBC

The Evolution of Delta Lake from Data + AI Summit 2024

16:07

The Evolution of Delta Lake from Data + AI Summit 2024

Просмотров 1,8 тыс.День назад

Shant Hovsepian, Chief Technology Officer of Data Warehousing at Databricks explains why Delta Lake is the most adopted open lakehouse format. Includes: - Delta Lake UniForm GA (support for and compatibility with Hudi, Apache Iceberg, Delta) - Delta Lake Liquid Clustering - Delta Lake production-ready catalog (Iceberg REST API) - The growth and strength of the Delta ecosystem - Delta Kernel - D...

2:33

Setting up PAT and Secret Scope

Просмотров 340День назад

Quick video on how to setup a Personal Access Token and Secret Scope and Secret with Azure Key Vault.

Increase your column sizes without rewriting the entire table

2:02

Increase your column sizes without rewriting the entire table

Просмотров 721День назад

Docs: docs.databricks.com/en/delta/type-widening.html

Announcing Delta Lake 4.0 with Liquid Clustering. Presented by Shant Hovsepian at Data + AI Summit

5:15

Announcing Delta Lake 4.0 with Liquid Clustering. Presented by Shant Hovsepian at Data + AI Summit

Просмотров 4,4 тыс.День назад

Shant Hovsepian, CTO of Data Warehousing at Databricks announced the biggest Delta Lake release to date, Delta 4.0, during the Data AI Summit 2024 in San Francisco. Speaker: Shant Hovsepian, Chief Technology Officer of Data Warehousing, Databricks

Open Sourcing Unity Catalog Live Onstage with Matei Zaharia at Data + AI Summit 2024

0:56

Open Sourcing Unity Catalog Live Onstage with Matei Zaharia at Data + AI Summit 2024

Просмотров 887День назад

Speaker: Matei Zaharia, Original Creator of Apache Spark™ and MLflow; Chief Technologist, Databricks Matei Zaharia, Original Creator of Apache Spark™ and MLflow and Chief Technologist at Databricks open sourced Unity Catalog live onstage at the Data AI Summit 2024 in San Francisco.

Databricks LakeFlow: A Unified, Intelligent Solution for Data Engineering. Presented by Bilal Aslam

16:58

Databricks LakeFlow: A Unified, Intelligent Solution for Data Engineering. Presented by Bilal Aslam

Просмотров 7 тыс.День назад

Speaker: Bilal Aslam, Sr. Director of Product Management, Databricks Bilal explains that everything starts with good data and outlines the three steps to good data including, ingesting, transforming and orchestrating your data. Then Bilal announces Databricks LakeFlow - a unified solution for data engineering. With LakeFlow you can ingest data from databases, enterprise apps and cloud sources, ...

Recap of Announcements at Data + AI Summit 2024 with Ali Ghodsi, Co-Founder and CEO, Databricks

0:38

Recap of Announcements at Data + AI Summit 2024 with Ali Ghodsi, Co-Founder and CEO, Databricks

Просмотров 826День назад

Ali Ghodsi, Co-founder and CEO of Databricks closes the 2024 Data AI Summit with a recap of Databricks and open source innovation announced during the 4-day conference in San Francisco. Speaker: Ali Ghodsi, Co-founder and CEO, Databricks @Databricks

Announcing Databricks Clean Rooms with Live Demo. Presented by Matei Zaharia and Darshana Sivakumar

7:47

Announcing Databricks Clean Rooms with Live Demo. Presented by Matei Zaharia and Darshana Sivakumar

Просмотров 993День назад

Speakers: Matei Zaharia, Original Creator of Apache Spark™ and MLflow; Chief Technologist, Databricks Darshana Sivakumar, Staff Product Manager, Databricks Organizations are looking for ways to securely exchange their data and collaborate with external partners to foster data-driven innovations. In the past, organizations had limited data sharing solutions, relinquishing control over how their ...

Data Sharing and Cross-Organization Collaboration. Presented by Matei Zaharia at Data + AI Summit

6:13

Data Sharing and Cross-Organization Collaboration. Presented by Matei Zaharia at Data + AI Summit

Просмотров 327День назад

Speaker: Matei Zaharia, Original Creator of Apache Spark™ and MLflow; Chief Technologist, Databricks Summary: Data sharing and collaboration are important aspects of the data space. Matei Zaharia explains the evolution of the Databricks data platform to facilitate data sharing and collaboration for customers and their partners. Delta Sharing allows you to share parts of your table with third pa...

Announcing Unity Catalog Metrics with Live Demo. Matei Zaharia and Zeashan Pappa at Data + AI Summit

5:18

Announcing Unity Catalog Metrics with Live Demo. Matei Zaharia and Zeashan Pappa at Data + AI Summit

Просмотров 863День назад

Announcing Unity Catalog Metrics with Live Demo. Matei Zaharia and Zeashan Pappa at Data AI Summit

Evolving Data Governance With Unity Catalog Presented by Matei Zaharia at Data + AI Summit 2024

14:43

Evolving Data Governance With Unity Catalog Presented by Matei Zaharia at Data + AI Summit 2024

Просмотров 2,6 тыс.День назад

Evolving Data Governance With Unity Catalog Presented by Matei Zaharia at Data AI Summit 2024

Unity Catalog Demo of New Features with Zeashan Pappa at Data + AI Summit 2024

5:22

Unity Catalog Demo of New Features with Zeashan Pappa at Data + AI Summit 2024

Просмотров 905День назад

Unity Catalog Demo of New Features with Zeashan Pappa at Data AI Summit 2024

How Data Intelligence is Delivering Big Wins at Texas Rangers. Alexander Booth at Data + AI Summit

10:46

How Data Intelligence is Delivering Big Wins at Texas Rangers. Alexander Booth at Data + AI Summit

Просмотров 324День назад

How Data Intelligence is Delivering Big Wins at Texas Rangers. Alexander Booth at Data AI Summit

The Future of Lakehouse Format Interoperability with Ali Ghodsi and Ryan Blue at Data + AI Summit

5:01

The Future of Lakehouse Format Interoperability with Ali Ghodsi and Ryan Blue at Data + AI Summit

Просмотров 531День назад

The Future of Lakehouse Format Interoperability with Ali Ghodsi and Ryan Blue at Data AI Summit

How to Make Small Language Models Work. Yejin Choi Presents at Data + AI Summit 2024

17:52

How to Make Small Language Models Work. Yejin Choi Presents at Data + AI Summit 2024

Просмотров 3,3 тыс.День назад

How to Make Small Language Models Work. Yejin Choi Presents at Data AI Summit 2024

Data + AI Summit Keynote 2024 - Day 2 Opening Remarks with Ali Ghodsi

2:52

Data + AI Summit Keynote 2024 - Day 2 Opening Remarks with Ali Ghodsi

Просмотров 238День назад

Data AI Summit Keynote 2024 - Day 2 Opening Remarks with Ali Ghodsi

Building an Enterprise Data & AI Catalog with Databricks Unity Catalog

0:53

Building an Enterprise Data & AI Catalog with Databricks Unity Catalog

Просмотров 1,2 тыс.День назад

Building an Enterprise Data & AI Catalog with Databricks Unity Catalog

Patrick Wendell, Co-founder and VP of Engineering on Building Production-Quality AI Systems

36:31

Patrick Wendell, Co-founder and VP of Engineering on Building Production-Quality AI Systems

Просмотров 1,4 тыс.День назад

Patrick Wendell, Co-founder and VP of Engineering on Building Production-Quality AI Systems

22:52

The Best Data Warehouse is a Lakehouse

Просмотров 4,1 тыс.День назад

The Best Data Warehouse is a Lakehouse

Building and Deploying GenAI Apps at Block with Jackie Brosamer, Head of AI, Data & Analytics

7:36

Building and Deploying GenAI Apps at Block with Jackie Brosamer, Head of AI, Data & Analytics

Просмотров 616День назад

Building and Deploying GenAI Apps at Block with Jackie Brosamer, Head of AI, Data & Analytics

Fei Fei Li, Professor, Stanford University on the History and Future of AI at Data + AI Summit 2024

18:26

Fei Fei Li, Professor, Stanford University on the History and Future of AI at Data + AI Summit 2024

Просмотров 16 тыс.День назад

Fei Fei Li, Professor, Stanford University on the History and Future of AI at Data AI Summit 2024

Building an Insights Factory at General Motors - Data + AI Summit 2024

7:18

Building an Insights Factory at General Motors - Data + AI Summit 2024

Просмотров 434День назад

Building an Insights Factory at General Motors - Data AI Summit 2024

Jensen Huang, Founder and CEO of NVIDIA with Ali Ghodsi, Co-founder and CEO of Databricks

25:02

Jensen Huang, Founder and CEO of NVIDIA with Ali Ghodsi, Co-founder and CEO of Databricks

Просмотров 37 тыс.День назад

Jensen Huang, Founder and CEO of NVIDIA with Ali Ghodsi, Co-founder and CEO of Databricks

Ali Ghodsi, Databricks Co-founder and CEO Closes the Keynote with a Summary of Product Announcements

1:26

Ali Ghodsi, Databricks Co-founder and CEO Closes the Keynote with a Summary of Product Announcements

Просмотров 323День назад

Ali Ghodsi, Databricks Co-founder and CEO Closes the Keynote with a Summary of Product Announcements

@yao5261 18 часов назад
懂了，赛博号脉！
@FullEvent5678 День назад
Very inspiring! My mind is going att 1000 miles an hour with ideas for our startup and clients from this!
@subedi04 День назад
Where can access your code or workbook? Would be nie to run your code.
@AadidevSooknananNXS День назад
Holden and team are incredibly engaging and very easy to understand!
@ia6906 День назад
Great feature, please also include low code features in order to be more beneficial as Data factory also has for ETL
@Naraharisettiraviteja День назад
awesome
@brento2890 2 дня назад
Excellent presentation, beginning 3.5-4.0 Billion years ago and explaining all the way to now (AI, non-physical-spatial). Excellent. Thank you. 👏
@TheDataArchitect 2 дня назад
Who's the speaker?
@Databricks День назад
Holly Smith - FYI it's also me in the comments for my videos so fire away with any technical follow on questions - Holly
@TheDataArchitect День назад
@@Databricks Awesome thanks
@muhammadibrahimabdullahi3840 2 дня назад
AI can do everything you need to do in times of studying and understanding AI.
@benim1917 2 дня назад
Awesome 👏🏾
@Thegameplay2 2 дня назад
🎉
@gravenguan 2 дня назад
How did parse_json handle schema evolution and from my kowledge, prod table do not recommend parse schema on the fly, it's more safer to define schema first
@Databricks 2 дня назад
I agree, but with a lot of JSON data you don't know the schema upfront and so can't define it. It's worth noting this is different from inferring the schema which looks at the first 1000 rows and is brittle to upstream changes - Holly
@gravenguan 2 дня назад
@@Databricks We used parse_json for dev and exploration purposes as well, thank for the clarification
@Databricks 2 дня назад
@@gravenguan No worries! Hope this clarifies for other users too
@nagendrasrinivas-cj7sr 2 дня назад
this is clearly copied from snowflake
@Databricks 2 дня назад
Variants in their various forms have been around for many decades. We're big fans of open source so anyone can use the implementation in other projects or products.
@TheDataArchitect 2 дня назад
That's awesome.
@matthiasmueller9340 2 дня назад
How can I specify the required runtime version when using serverless sql warehouse?
@Databricks 2 дня назад
Variant types will be coming to serverless early/mid July, no need to select a runtime - Holly
@afrikaniz3d 2 дня назад
Only note for these videos, since they're not Shorts, ia that it would be more beneficial to use the full wide (1920 x 1080) format, so it's more readable at all resolutions.
@Databricks День назад
I completely hear you, trying to figure out the best way to film for multiple platforms at once when some define 'short' as <10 mins and RUclips graces me with a mere 60 seconds - Holly
@EranM 3 дня назад
Can't you get the score (ranking score | similarity score) while fetching items from the Vector DB? ..
@EranM 3 дня назад
can someone explain to me, how come you calculate USER embedding when training. And when searching for similar embeddings, you actually get ITEMS embeddings???
@LQDEN 4 дня назад
Still didn't explain what it is exactly
@gybob100 4 дня назад
The shovel company telling you how valuable the gold is
@user-he1hs5vx3d 5 дней назад
She is creepy because she is not an honest person. She keeps stealing others works and ideas to pretend she is an expert. To make her greater, she belittles others, including her student (5:30).
@jianguo8233 5 дней назад
Is 4.0 a release or preview today?
@uchechukwumadu9625 5 дней назад
Insightful!
@slavenlulic7736 5 дней назад
powerfull
@SnatrWhamo 6 дней назад
Great video and very very useful! While implementing, I got stuck uploading the pdf to a Volume in the Unity Catalog. I am the "Owner" of my Databricks Workspace and Azure account although I don't seem to have the option to add a Volume to a Catalog and thus don't have the option to add the pdf to a Volume. This seems to have to do with permissions and possibly setting up a metastore between DataBricks and Azure Blob Storage? Might you have any insights, ideas, solutions or workarounds? Thanks again for a great video and all the resources to implement this super useful technology!
@jasondrew2087 6 дней назад
Couple of things, you need USE SCHEMA and CREATE VOLUME permissions on the Schema and USE CATALOG on the catalog. Also you need CREATE EXTERNAL VOLUME permissions on the External Location you plan on using for your Volume.
@BlizzardzRS 6 дней назад
While I appreciate the contributions Databricks's makes to the open source community, *this video is incredibly misleading*. DBRX is *not* the highest production quality open-source model nor the best in price per performance. The graph you showed is incredibly misleading, not least because you compared your models to LLaMa2-70B. No one in their right mind at the time of this video's recording is using LLaMa2-70B. Everyone has moved on to LLaMa3, with many providers even disabling LLaMa2 on their platforms because it is more expensive and less performant than LLaMa3. A fairer comparison would be between DBRX and LLaMa3-70B and LLaMa3-8B. You didn’t show that because DBRX gets roasted in these comparisons. (Your talked about the cost associated with training your LLMs and how the cost has come down substantially. Really, this is an argument that the $10M Mosaic/Databricks have spent on DBRX is already redundant. You guys are losing credibility by posting stuff like this. Databricks does some great work. Don’t tarnish your reputation with borderline fraudulent content like this.
@georges7298 7 дней назад
Thanks - for the open sourcing, and for the summit.
@BeginnerAlchemist 7 дней назад
I have a question: why we try to research Small-LM just to avoid using GPUs? If we want to save the money for training, we can do the research for how to make GPU or model more effectively, not to avoid using higher techs.
@DamaruM 6 дней назад
GPU= power consumption
@tulikabose5120 3 дня назад
It's not just for GPUs...Small-LM has its own market for on-device or on-edge processing, where there are concerns of privacy and customers would not want their data to go to clouds, and secondly in many industrial use-cases where internet and cloud access isn't accessible due to the remote nature of the use-case, and model inference needs to be done on device...The demand for SLMs is increasing in such use cases...Many big tech companies are not just working on LLMs but also on SLMs under the hood as both of them have to co-exist to cater to different user requirements.
@BeginnerAlchemist 3 дня назад
@@tulikabose5120 Thank you, I see. It is useful for small devices with limited calculation hardware and the privacy. That's true. So many LLM need a huge data to train and it should collect people's private info to become stronger. That's hated by most of people.
@mc.pretzel 7 дней назад
Boomshakalaka!
@plartoo 7 дней назад
:D Show us how to do more complex data transformations than just a simple join you demo-ed and what the actual limitations are (because that's where the reality meets the demo). While you are at it, tell us how to automate (schedule) this pipeline and set up notifications and data quality checks. Next, let us know how to QA that dashboard you let GenAI created (to make sure it's not hallucinating and spitting out bullshit while destroying our firm's reputation), and how to surface it to customers via URL in a secure way (without paying you through our noses). Finally, tell us how much it costs to process GBs of data per month. This is the unbearably condescending demo that assumes the attendees are stupid and don't know what entails in serious, real-world data wrangling. And I know a couple of my clients who are leaving Databricks because they are freaking expensive.
@ser1ification 3 дня назад
Exactly. I’m tired of these hype machines. Everything is in beta. Customers are the beta testers. Only thing these guys did good is the Unity Catalog. Of course Spark and Delta as well.
@gopi4841 8 дней назад
Nice one, Darshana.
@xiaoyu2270 8 дней назад
jensen from china wenzhou
@chima6291 2 дня назад
bullshit. He was born in Taiwan
@forrestbajbek3900 8 дней назад
Wow, this is a huge improvement.
@AleksandarKrumov-pm4tk 8 дней назад
wow
@cobrider2 8 дней назад
2 reactions: - by querying the table with duckdb, the authentication and permission is handled only by Unity Catalog, and not by the underlying storage solution (AWS S3, Azure ADLS, ...). right ? - Applying column masks will only work for hosted compute like the databricks clusters, because querying with a local self hosted compute like DuckDB requires to download the parquet files (containing the PII data) locally then only execute the query... meaning you actually have PII data downloaded on your local machine. right ?
@cobrider2 8 дней назад
had a laugh, thank you
@subhroitmecse 8 дней назад
Examples are not clear about Delta lake ACID properties.
@Clammer999 8 дней назад
One of my favourite AI legends. Her passion for humanity and how AI can be leveraged to help improve people’s lives is admirable and astounding.
@WonkaTruck 8 дней назад
I still can't read Iceberg in Databricks, stop hoping for adoption and just fix that...
@DCC72 8 дней назад
And from nothing, a college professor just evolved from the bacteria. Rubbish.
@sunnychabbi3639 8 дней назад
Pls provide notebook. It is not available in dbdemos
@GerardInnes 9 дней назад
As a new RUclipsr you doing very well. He teaching us option trading nicely. Just need to be consistent with this process of trading on binary options...
@henryebube3576 9 дней назад
I followed you tutor.I get stuck at 9.38. I type databricks-bge-large-en as the embedding model but the create button is disable not sure why
@jasondrew2087 8 дней назад
You shouldn't have to type it in, rather it should be an option in the drop down. If you go to Serving do you see it listed as a Foundational model?

Databricks

Комментарии